Network like it’s 1999 with BGP EVPN
Contributors: Robb Boyd
Long before before ‘virtual’ became synonymous with servers, Network VLANS gave us a layer 2 answer for much needed network segmentation. It also opened things up for more flexible network designs.
Watch TWTV 186 “Scaling Multitenancy with VXLAN”
As server virtualization exploded however, multiple network design issues began to converge and revealed new issues to overcome:
- Hardware costs rose as Spanning Tree’s lack of multipathing required redundant boxes to lay dormant in the event of failure.
- Larger networks exposed the limited number of segments that VLANs could scale, capping out at just 4,094 segments, (or less when using STP).
- Multi-tenancy put further pressure on VLAN limitations.
The answer to these problems arrived in 2012 as RFC 7348: VXLAN, Virtual Extensible LAN. A network overlay with the entire layer 2 frame encapsulated in UDP by only adding around 50 bytes of overhead.
So now, the 4,094 segment limitation of VLAN had expanded to an incredible 16 million segments with VXLAN. The ability for VXLAN to span layer 3 boundaries was an additional benefit for cloud networks, yet another concept emerging from server virtualization advancements.
Millions of segments, tenant isolation, layer 3 multipathing. No more network issues. Right? Ironically, these larger VXLAN enhanced networks, began discovering new limitations.
99 Problems and the BUM ain’t One.
One of the most significant barriers for adoption of VXLAN has been its reliance upon multicast for handling BUM (broadcast/multicast/unknown unicast) traffic.
Switches build address tables by observing incoming traffic. When a switch gets incoming traffic for a destination it does not have in its list, it will make copies and send it out ALL interfaces within a given VLAN. Multicasts are essentially group addresses that can’t be used to fill out these tables so they get flooded out ALL interfaces.
VXLAN does not stop this behavior. It will take a layer 2 packet, wrap it in a layer 3 IP header and send it out over UDP. With no way to determine the destination IP address, VXLAN will use flooding and dynamic MAC-learning (IP Multicast) to determine where it should go.
This leads to a lot of extra network design set-up, excessive network traffic, and some dependence on the physical network since there has to be an IP Multicast enabled core.
Watch TechWiseTV 186 “Scaling Multitenancy with VXLAN”
VXLAN brought incredible scale with no practical support for numerous multicast groups. Larger, multi-tenant networks were forced to map multiple VXLAN segments to a single multicast group, which then lead to problematic flooding of traffic. (quote from Lukas)
The VXLAN specification states that there should be no reliance on a control plane or a physical to virtual mapping table. This lack of a control plane has severely limited both flexibility and scalability for multitenant cloud operators.
BGP EVPN as the Control Plane for VXLAN
The familiar routing protocol BGP has become the reachability protocol for forwarding decisions as a backfill the missing control plane function.
BGP Control Plane for VXLAN removes the flood-and-learn semantics by distributing reachability information. This eliminates unknown unicast almost entirely. Only true L2 multi-destination traffic like “broadcasts” or generic multi-destination traffic like overlay multicast leverage the replication provided by multicast
Note from Lukas: for VXLAN and VXLAN EVPN, we still stick to Multicast as an option for the BUM traffic (other option is ingress replication, also available for VXLAN Flood&Learn or RFC7348) but with EVPN we reduce the need for BUM handling for address resolution and unknown unicast nearly entirely.
Technically, it is using the Ethernet virtual private network (EVPN) address-family extension of Multiprotocol BGP to distribute overlay reachability information. EVPN is a Layer 2 VPN technology that uses BGP as a control-plane for MAC address signaling/learning and VPN endpoint discovery.
EVPN is the first approach to combine L2 and L3, allowing users to build easily deployed bridged and easily scalable, routed overlays. All other solutions focus on a L3 control-plane only (i.e. MPLS L3VPN, DFA, GRE)
BGP authentication and security constructs provide more secure multitenancy with VXLAN tunnel identification. MP-BGP provides the policy construct for scalability by constraining route updates where they are not needed.
The BGP Control Plane for VXLAN allows the Cisco Nexus 7000, 7700, 9300, 9500 to support VXLAN in both multicast flood-and-learn and the BGP-EVPN control plane. More model support continues to roll out.
The multi-folded capability of the Nexus 9000-Series allows flexible connectivity for servers attached to access or leaf switches by leveraging either ACI, VXLAN EVPN, VXLAN or classic ethernet.
The 9000 family leaf switches can also route VXLAN overlay traffic through a custom Cisco ASIC. VXLAN routing at the leaf allows customers to bring their boundary between Layer 2 and 3 overlays down to the leaf/access layer, which offers multiple unique benefits:
- More scalable design,
- Network failure containment,
- Transparent mobility,
- Better abstract connectivity and policy.
Watch TechWiseTV 186 “Scaling Multitenancy with VXLAN”
Huge thanks to the genius that is Lukas Krattiger. Be sure to follow him and make time for his Cisco Live sessions if you can get in. He also did some ‘Live Lessons’ with MUCH more detail available here through Cisco Press.
Some other resources:
- Cisco Border Gateway Protocol Control Plane for Virtual Extensible LAN
- Lukas leads a full room at Cisco Live San Diego: BRKDCT-3378 – Building Data Center Networks with Overlays (VXLAN/EVPN & FabricPath)
- White Paper on BGP VXLAN
- BGP Control Plane for VXLAN works with platforms that are consistent with the IETF draft for EVPN
- Blog on the design and evolution with VXLAN
- Cisco Nexus 9000 Series Switches: Data Sheets and Literature
Thank you for watching!