Given the tremendous interest in VXLAN with MP-BGP based EVPN Control-Plane (short EVPN) at Cisco Live in Milan, I decided to write a “short” technology brief blog post on this topic.
VXLAN (IETF RFC7348) has been designed to solve specific problems faced with Classical Ethernet for a few decades now. By introducing an abstraction through encapsulation, VXLAN has become the de-facto standard overlay of choice in the industry. Chief among the advantages provided by VXLAN; extension of the todays limited VLAN space and the increase in the scalability provided for Layer-2 Domains.
Extended Namespace – The available VLAN space from the IEEE 802.1Q encapsulation perspective is limited to a 12-bit field, which provides 4096 VLANs or segments. By encapsulating the original Ethernet frame with a VXLAN header, the newly introduced addressing field offers 24-bits, thereby providing a much larger namespace with up to 16 Million Virtual Network Identifiers (VNIs) or segments.
While the VXLAN VNI allows unique identification of a large number of tenant segments which is especially useful in high-scale multi-tenant deployments, the problems and requirements of large Layer-2 Domains are not sufficiently addressed. However, significant improvements in the following areas have been achieved:
- No dependency on Spanning-Tree protocol by leveraging Layer-3 routing protocols
- Layer-3 routing with Equal Cost Multi-Path (ECMP) allows all available links to be used
- Scalability, convergence, and resiliency of a Layer-3 network
- Isolation of Broadcast and Failure Domains
IETF RFC7348 – VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks
Scalable Layer-2 Domains
The abstraction by using a VXLAN-like overlay does not inherently change the Flood & Learn behavior introduced by Ethernet. In typical deployments of VXLAN, BUM (Broadcast, Unicast, Multicast) traffic is forwarded via layer-3 multicast in the underlay that in turn aids in the learning process so that subsequent traffic need not be subjected to this “flood” semantic. A control-plane is required to minimize the flood behavior and proactively distribute End-Host information to participating entities (typically called Virtual Tunnel End Points aka VTEPs) in the same segment – learning.
Control-plane protocols are mostly employed in the layer-3 routing space where predominantly IP prefix information is exchanged. Over the past years, some of the well-known routing protocols have been extended to also learn and exchange Layer-2 MAC addresses. An early technology adoption with MAC addresses in a routing-protocol was Cisco’s OTV (Overlay Transport Virtualization), which employed IS-IS to significantly reduce flooding across Data Center Interconnects (DCI).
Multi-Protocol BGP (MP-BGP) introduced a new Network Layer Reachability Information (NLRI) to carry both, Layer-2 MAC and Layer-3 IP information at the same time. By having the combined set of MAC and IP information available for forwarding decisions, optimized routing and switching within a network becomes feasible and the need for flood to do learning get minimized or even eliminated. This extension that allows BGP to transport Layer-2 MAC and Layer-3 IP information is called EVPN – Ethernet Virtual Private Network.
EVPN is documented in the following IETF drafts
Integrated Route and Bridge (IRB) – VXLAN-EVPN offers significant advantages in Overlay networking by optimizing forwarding decision within the network based on Layer-2 MAC as well as Layer-3 IP information. The decision on forwarding via routing or switching can be done as close as possible to the End-Host, on any given Leaf/ToR (Top-of-Rack) Switch. The Leaf Switch provides the Distributed Anycast Gateway for routing, which acts completely stateless and does not require the exchange of protocol signalization for election or failover decision. All the reachability information available within the BGP control-plane is sufficient to provide the gateway service. The Distributed Anycast Gateway also provides integrated routing and bridging (IRB) decision at the Leaf Switch, which can be extended across a significant number of nodes. All the Leaf Switches host active default gateways for their respective configured subnets; the well known semantic of First Hop Routing Protocols (FHRP) with active/standby does not apply anymore.
Summary – The advantages provided by a VXLAN-EVPN solution are briefly summarized as follows:
- Standards based Overlay (VXLAN) with Standards based Control-Plane (BGP)
- Layer-2 MAC and Layer-3 IP information distribution by Control-Plane (BGP)
- Forwarding decision based on Control-Plane (minimizes flooding)
- Integrated Routing/Bridging (IRB) for Optimized Forwarding in the Overlay
- Leverages Layer-3 ECMP – all links forwarding – in the Underlay
- Significantly larger Name-Space in the Overlay (16M segments)
- Integration of Physical and Virtual Networks with Hybrid Overlays
- It facilitates Software-Defined-Networking (SDN)
Simply formulated, VXLAN-EVPN provides a standards-based Overlay that supports Segmentation, Host Mobility, and High Scale.
VXLAN-EVPN is available on Nexus 9300 (NX-OS 7.0) with Nexus 7000/7700 (F3 linecards) to follow in the upcoming major release. Additional Data Center Switching platforms, like the Nexus 5600, will follow shortly after.
A detailed whitepaper on this topic is available on Cisco.com. In addition, VXLAN-EVPN was featured during the following Cisco Live! Sessions.
Do you have appetite for more? Post a comment, tweet about it and have the conversation going … Thanks for reading and Happy Networking!
Tags: #CLEUR, Cisco, cisco live, Cisco Nexus, Cisco Nexus 9000, data center, EVPN, ietf, network, nexus, rfc7348, SDN, VXLAN
The increasing pace of business is creating high demand for IT efficiency and speed.
IT directly affects your ability to respond correctly and quickly to new opportunities. Slow IT equates to slow business.
The latest release of Cisco UCS Director allows IT to automate the deployment of infrastructure delivering the speed and efficiency IT needs in order to support business demands.
Not familiar with Cisco UCS Director? Briefly, this solution reduces data center complexity by replacing manual provisioning tasks with unified automated workflows that span compute, network, storage and virtualization. But this solution doesn’t manage just Cisco products. It unites Cisco UCS-based integrated infrastructures and third-party hardware solutions into a single management view to ensure maximum IT efficiency. Heterogeneous management is just one of the features that makes Cisco UCS Director different. You can learn more by going here. Read More »
Tags: cloud, FlexPod, infrastructure automation, nexus, orchestration, UCS, ucs director, Vblock, vcac, VMware, vspex
The connections between your business success and data center efficiency have never been clearer or stronger. Delivering data center resources quickly to capitalize on new business opportunities or projects is a major factor for business success. The only way to expedite resource delivery is by replacing manual processes with repeatable, best-practice-based automated service consumption and delivery.
So what is getting in the way?
The first problem is inconsistent delivery of standardized infrastructure instances. Many solutions on the market today only manage virtual resources while others just support their proprietary hardware stack. Automation needs to be across all layers – and that includes networks. I believe the only way to achieve the speed and efficiency businesses desire is to manage your entire heterogeneous data center as a unified whole — holistically. Read More »
Tags: automation, FlexPod, infrastructure, nexus, orchestration, Prime Service Catalog, self-service delivery, UCS, ucs director, VACS, Vblock, vcac, vCloud Suite, VMware, vspex
To celebrate 30 years of innovation at Cisco (#We are Cisco), we’ve asked Cisco Champions what they think is the most important Cisco innovation to date. Cisco Champions are seasoned IT technical experts and influencers who enjoy sharing their knowledge, expertise, and thoughts across the social web and with Cisco. The Cisco Champions program encompasses different areas of interest, such as Data Center, Internet of Things, Enterprise Networks, Collaboration and Security. Cisco Champions are located all over the world.
(Cisco Champions are not representatives of Cisco. Their views are their own)
Here are their top answers.
Cisco Nexus Series
The most important innovation for me is the Data Center Networking Solution with Nexus Portfolio N2K, N5K, N7K, and N9K, that allows us to address all challenges for our customers. I really appreciate the new campus solution based on C6800 with IA switches which uses the same technology as FEX. It really simplifies architecture and reduces OPEX with a single point of management.
Network Consulting Engineer
@BBordereau Read More »
Tags: #ciscochampion, Cisco Certification, Cisco UCS, nexus
MDS 9500 family has supported customers for more than a decade helping them through FC speed transitions from 1G, 2G, 4G, 8G and 8G advanced without forklift upgrades. But as we look in the future the MDS 9700 makes more sense for a lot of data center designs. Top four reasons for customers to upgrade are
- End of Support Milestones
- Storage Consolidation
- Improved Capabilities
- Foundation for Future Growth
So lets look at each in some detail.
- End of Support Milestones
MDS 4G parts are going End of Support on Feb 28th 2015. Impacted part numbers are DS-X9112, DS-X9124, DS-X9148. You can use the MDS 9500 Advance 8G Cards or MDS 9700 based design. Few advantages MDS 9700 offers over any other existing options are
a. Investment Protection – For any new Data Center design based on MDS 9700 will have much longer life than MDS 9500 product family. This will avoid EOL concerns or upgrades in near future. Thus any MDS 9700 based design will provide strong investment protection and will also ensure that the architecture is relevant for evolving data center needs for more than a decade.
b. EOL Planning – With MDS 9700 based design you control when you need to add any additional blades but with MDS 9500, you will have to either fill up the chassis within 6 months (End of life announcement to End of Sales) or leave the slots empty forever after End of Sale date.
c. Simplify Design - MDS 9700 will allow single skew, S/W version, consistent design across the whole fabric which will simplify the management. MDS 9700 massive performance allows for consolidation and thus reducing footprint and management burden.
d. Rich Feature Set - Finally as we will see later MDS provides host of features and capabilities above and beyond MDS 9500 and that enhancement list will continue to grow.
- Storage Consolidation
MDS 9700 provides unprecedented consolidation compared to the existing solutions in the industry. As an example with MDS 9710 customers can use the 16G Line Rate ports to support massively virtualized workload and consolidate the server install base. Secondly with 9148S as Top of Rack switch and MDS 9700 at Core, you can design massively scalable networks supporting consistent latency and 16G throughput independent of the number of links and traffic profile and will allow customers to Scale Up or Scale Out much more easily than legacy based designs or any other architecture in the industry.
Moreover as shown in figure above for customers with MDS 9500 based designs MDS 9710 offers higher number of line rate ports in smaller footprint and much more economical way to design SANs. It also enables consolidation with higher performance as well as much higher availability.
- Improved Capabilities
MDS 9700 design provides more enhanced capabilities above and beyond MDS 9500 and many more capabilities will be added in future. Some examples that are top of mind are detailed below
Availability: MDS 9700 based design improves the reliability due to enhancements on many fronts as well as simplifying the overall architecture and management.
- MDS 9710 introduced host of features to improve reliability like industry’s first N+1 Fabric redundancy, smaller failure domains and hardware based slow drain detection and recovery.
- Its well understood that reliability of any network comes from proper design, regular maintenance and support. It is imperative that Data Center is on the recommended releases and supported hardware. As an example data center outage where there are unsupported hardware or software version failure are exponentially more catastrophic as the time to fix those issues means new procurement and live insertion with no change management window. Cost of an outage in an Data Center is extremely high so it is important to keep the fabric upgraded and on the latest release with all supported components. Thus for new designs it makes sense that it is based on the latest MDS 9700 directors, as an example, rather than MDS 9513 Gen-2 line cards because they will fall of the support on Feb 28, 2015. Also a lot of times having different versions of the hardware and different software versions add complexity to the maintenance and upkeep and thus has a direct impact on the availability of the network as well as operational complexity.
With massive amounts of virtualization the user impact is much higher for any downtime or even performance degradation. Similarly with the data center consolidation and higher speeds available in the edge to core connectivity more and more host edge ports are connected through the same core switches and thus higher number of apps are dependent on consistent end to end performance to provide reliable user experience. MDS 9700 provides industries highest performance with 24Tbps switching capability. The Director class switch is based on Crossbar architecture with Central Arbitration and Virtual Output Queuing which ensures consistent line rate 16G throughput independent of the traffic profile with all 384 ports operating at 16G speeds and without using crutches like local switching (muck akin to emulating independent fixed fabric switches within a director), oversubscription (can cause intermittent performance issues) or bandwidth allocation.
MDS Directors are store and forward switches this is needed as it makes sure that corrupted frames are not traversing everywhere in the network and end devices don’t waste precious CPU cycles dealing with corrupted traffic. This additional latency hit is OK as it protects end devices and preserves integrity of the whole fabric. Since all the ports are line rate and customers don’t have to use local switching. This again adds a small latency but results in flexible scalable design which is resilient and doesn’t breakdown in future. These 2 basic design requirements result in a latency number that is slightly higher but results in scalable design and guarantees predictable performance in any traffic profile and provides much higher fabric resiliency .
Consistent Latency: For MDS directors latency is same for the 16G flow to when there are 384 16G flows going through the system. Crossbar based switch design, Central arbitration and Virtual Output Queuing guarantees that. Having a variable latency which goes from few us to a high number is extremely dangerous. So first thing you need to make sure is that director could provide consistent and predictable latency.
End to End latency: Performance of any application or solution is dependent on end to end latency. Just focusing on SAN fabric alone is myopic as major portion of the latency is contributed by end devices. As an example spinning targets latency is of the order of ms. In this design few us is orders of magnitude less and hence not even observable. With SSD the latency is of the order of 100 to 200 us. Assuming 150 us the contribution of SAN fabric for edge core is still less than 10%. Majority (90%) of the latency is end devices and saving couple of us in SAN Fabric will hardly impact the overall application performance but the architectural advantage of CRC based error drops and scalable fabric design will make provided reliable operations and scalable design.
For larger Enterprises scalability has been a challenge due to massive amount of host virtualization. As more and more VMs are logging into the fabric the requirement from the fabric to support higher flogins, Zones. Domains is increasing. MDS 9700 has industries highest scalability numbers as its powered by supervisor that has 4 times the memory and compute capability of the predecessor. This translates to support for higher scalability and at the same time provides room for future growth.
Foundation for Future Growth:
MDS 9700 provides a strong foundation to meet the performance and scalability needs for the Data Center requirements but the massive switching capability and compute and memory will cover your needs for more than a decade.
It will allow you to go to 32G FC speeds without forklift upgrade or changing Fabric Cards (rather you will need 3 more of the same Fabric card to get line rate throughput through all the 384 ports on MDS 9710 (and 192 on MDS 9706).
MDS 9700 allow customers to deploy 10G FCoE solution today and upgrade without forklift upgrade again to 40G FCoE.
MDS 9700 is again unique such that customers can mix and match FC and FCoE line cards any way they want without any limitations or constraints.
Most importantly customers don’t have to make FC vs FCoE decision. Whether you want to continue with FC and have plans for 32G FC or beyond or if you are looking to converge two networks into single network tomorrow or few years down the road MDS 9700 will provide consistent capabilities in both architectures.
In summary SAN Directors are critical element of any Data Center. Going back in time the basic reason for having a separate SAN was to provide unprecedented performance, reliability and high availability. Data Center design architecture has to keep up with the requirements of new generation of application, virtualization of even the highest performance apps like databases, new design requirements introduced by solutions like VDI, ever increasing Solid State drive usage, and device proliferation. At the same time when networks are getting increasingly complex the basic necessity is to simplify the configuration, provisioning, resource management and upkeep. These are exact design paradigms that MDS 9700 is designed to solve more elegantly than any existing solution.
Although I am biased in saying that but it seems that you have voted us with your acceptance. Please see some more details here.
Live as if you were to die tomorrow. Learn as if you were to live forever.
Tags: 16 Gigabit, 16Gb, 16Gb Fibre Channel, 9500, 9710, architecture, availability, best practices, Cisco, cloud, Cloud Computing, Consolidation, convergence, data center, Data Mobility Manager, DCNM, design, Director, dmm, FCIP, FCoE, Fibre Channel, Fibre Channel over Ethernet, IO accelerator, it-as-a-service, MDS, MDS design, nexus, NX-OS, reliability, SAN, Storage, storage area networks, switch, switching, Unified Data Center, Unified Fabric, upgrade, virtualization