Overlays, Underlays and the New World Order
The networking industry has recently developed a renewed interest in virtual overlays, often wrapped in an “SDN as the controller” context. Amidst the promise, the hope and the hype, the following questions present themselves:
- What exactly is an overlay?
- What distinguishes an overlay from a VPN?
- How decoupled can an overlay be from the underlay network and what are the tradeoffs?
- What are the advantages of overlays and will they emerge as the new networking world order?
In this post, I will attempt to answer these questions.
In broadest terms, an overlay is a virtual network that is built on top of an underlying network infrastructure (the underlay). For the purpose of this discussion, an overlay must further meet the following four characteristics, as identified by the emerging work in NVO3:
- guarantee traffic segregation among users (tenants),
- support address space independence among those users,
- allow for dynamic end-station (server or VM) placement or migration independent of the underlay network’s addressing scheme, and
- support all of the above at large scale (millions of end-stations).
All these characteristics are, in essence, the attributes of a VPN. VPN technologies deployed today, whether Layer 2 or Layer 3, are implemented as overlays on top of an IP or MPLS underlay. In a VPN, the edge devices host the complex service functions such as tunnel encapsulation/de-capsulation, traffic and address space segregation via disjoint forwarding tables, etc… The core nodes of the network (or fabric, if you will) provide basic transport and connectivity among the edge nodes. To illustrate this point, consider the parallels between VXLAN and E-VPN. Both technologies serve to provide Layer 2 (Ethernet) virtual LAN connectivity between endpoints separated by an IP network. Both technologies overcome the ~4K VLAN identifier limitation. Both technologies construct their forwarding tables based on correlation between a given MAC address and an associated tunnel endpoint. In E-VPN, this correlation is distributed in BGP, whereas in VXLAN it is done via data-plane learning. The option of using control-plane learning for VXLAN is left as an open possibility (and Yves’ blog post discusses how LISP can serve as this control-plane). The main architectural difference between the two technologies is in how the multiplexing of virtual network segments is done: VXLAN uses globally assigned 24-bit VXLAN IDs, whereas E-VPN uses downstream-assigned 20-bit MPLS labels. Of course, this architectural difference has ramifications on what control-plane needs to be running between the endpoints for the purpose of assigning the identifiers and on the data-plane encapsulation used. It is these differences which make each solution interesting for different customer types.
Overlays provide the allure of enabling new services with a high degree of transparency and decoupling from the underlay network. However, as with any engineering system, there are tradeoffs; and with overlays, these tradeoffs present themselves in a number of operational aspects.
The first aspect to consider is network performance. In the absence of any coordination between the overlay and underlay, the network cannot provide strict performance guarantees except with overprovisioning (read higher CapEx).
Another aspect is network manageability (i.e. OAM), and in particular fault isolation. By design, OAM mechanisms specific to the overlay have no visibility into the underlying transport topology. So network troubleshooting necessitates toolkits that span both the overlay as well as underlay network and can perform the correlation between overlay traffic and its mapping to underlying transport paths. This is reflected in the emerging work on OAM for Data Center technologies in the IETF’s TRILL, NVO3 and L2VPN workgroups. TRILL is currently the furthest along of the three, where the requirements, framework and solution are in the process of being crafted. In parallel, an OAM reference framework for overlays has been proposed in NVO3, while LSP layer OAM functions for E-VPN have been proposed in L2VPN.
A third aspect is traffic protection in the case of transient congestion caused by failures. Failures within the underlay that impact, for instance, the bisectional bandwidth across the fabric (such as the case where a link fails in an Ethernet bundle) are completely transparent to the superimposed overlays. This is the case unless if feedback mechanisms are employed to propagate state to the edge devices, to perform head-end rerouting decisions. Alternatively some means of exposing the knowledge of important overlay traffic to the underlay would be required so that the underlay transport steers critical flows away from the congestion point, at the possible expense of taking a sub-optimal path.
One of the main advantages of overlays is that they provide the ability to rapidly and incrementally deploy new functions through edge-centric innovations. New solutions or applications can be organically added to the existing underlay infrastructure by adding intelligence to the edge nodes of the network. This is why overlays often emerge as solutions to address the requirements of specific applications over an existing basic infrastructure, where either the required functions are missing in the underlay network, or the cost of total infrastructure upgrade is prohibitive from an economic standpoint.
From a political or administrative domain standpoint, the emergence of overlays is sometimes a reflection of the conflicts of interest or opposing viewpoints between the stakeholders in a given networking system. In other words, the person establishing the overlay does not want to be deeply coupled to the transport network. To ground this postulation with an example, one can reflect upon the raging arguments in the Data Center on whether host-based or network based multi-tenant segmentation is the way of the future. VXLAN with flooding/learning or a directory-based control-plane resonates with server administrators. FabricPath/TRILL coupled with TRILL-EVPN for data center interconnect aligns more with network administrators’ perspective. Eventually, the technology building blocks to support both paradigms will be available and deployments will choose one or the other depending on administrative considerations as well as the relative advantages and tradeoffs.
So, are overlays going to be ushering in a new networking world order? To answer that, let’s reflect back at a historical example of how Internet transport started as an overlay on top of the public telephone network (e.g. SONET, Dial, Frame Relay). What began as an overlay, later grew to become the platform over which a multitude of applications (including telephone/voice service) were enabled. This cycle has repeated itself for many of the services that we consider to be an integral part of the network infrastructure today, as the demands for performance and ubiquity forced the functions down into the underlay infrastructure. In conclusion, overlays are not disrupting how networking is done. Rather, overlays are one of the stages in the cycle of life of nascent networking technology.