Software Defined Networking for Service Providers: Data Center Fabric Analogies breakdown in the WAN
Lately I’ve been seeing some industry people trying to apply the principles of data center network fabric models to their Wide Area Networks (WANs), and implying that such can be extended through service provider WANs. Data center fabrics and WANs are horses of very different colors with way too many differences for these perspectives to hold up.
Fundamentally they are different beasts with one more easily tamed than the other. Data center networks generally have well known end points and well-ordered designs.
Multi-tenant Data Center Designs
Bandwidth within data centers is virtually unlimited relative to WAN bandwidth. It is much more stable and constrained in its characteristics when it comes to things like latency, loss, jitter, capacity, restoration capabilities – all of which have significant influence on WAN services delivery. The same data center network assumptions exist between each of the end points, which makes fabric modeling for data centers generally a good approximation and thus possible to use.
Some private WANs that interconnect data centers may align closely enough with a fabric model, making it a good enough approximation. But this is a unique case and is essentially Data Center Interconnect, which is just one of many services running across a service provider WAN. It can work because the endpoints are owned and controlled, or high quality leased lines with known bandwidth characteristics are used. In the service provider WAN that must manage multi-service delivery, (usually 1,000s of services) the model is no longer applicable.
For starters, WANs use hierarchical topologies connecting diverse and random set of end points, while data center network topologies map one set of known endpoints to one another.
A large Wide Area Network diagram looks more like this
WAN Traffic is Unpredictable
Service provider WANs cannot be fully meshed and be economically viable for optimized service delivery and costs to the service providers’ customers. WAN topologies are very irregular and often span large geographical distances with large numbers of nodes. WAN traffic is less predictable, is bursty and bandwidth constrained. These are fundamentally unbalanced networks and, by this I mean, there are continuous differences in utilization rates and the traffic paths that are used, since data, voice, video and gaming service applications each have different quality of experience metrics.
WAN Traffic Sample by Time and Applications
It’s a big stretch to try to apply lessons learned from data center networks to a service provider WAN, since the number of endpoints, routers, intersections and cross over points are too numerous to model with a simple fabric approach. A service delivery WAN can consist of tens of thousands of elements that span multiple operational and even organizational domains. Add to this the complexity of the large number of service profiles that typically exist. This is demonstrated by the ~2500 service profiles that Verizon has to manage, as one case in point. Variables like these simply do not exist in data center provisioning.
Data centers, by comparison, are simple. For example, top of rack switches aggregate any number of underlying network elements, which permits order and a well-defined set of rules to be applied. Since known aggregation points can talk to one another, traffic and loads are pretty much one-dimensional and symmetrical. Extrapolating the concepts of a DC fabric to SP WAN has too many shortfalls.
So let’s take a look at the problems SPs face when creating viable service delivery across the WAN. There are many examples of how a fabric model would break down. There are fundamental differences in traffic management between data centers in a cloud.
Edge Routers Aren’t Equivalent to Top of Rack Switches
Service provider Edge routers cannot equate to top of rack switches and the behavior on the WAN is not so predictable to make a fabric model reliable. We need visibility into the multi-layered topological states of the network and its attributes to make the right decisions and support the needs of individual applications that run in real-time over it. Automatically learning and discovering the topology, through network intelligence, permits us to adapt dynamically to things like link failures or change utilization or bandwidth reservation on the fly to address and meet the needs of each of the many applications. Punting such decisions off to some central control would result in less than optimal performance.
Service providers must offer very strict SLAs to their business customers and their network designers are required to pay very close attention to a multitudinous number of variables that impact each respective application. The WAN must deal with multi-layered, multi-dimensional relationships between transport, trunking MPLS and the services running. Service infrastructure needs to be optimized around the type of service and offer the customer the right quality of experience.
Multi-layer WAN Service Orchestration
Content Caching Further Breaks the Fabric Model
A great example that further exemplifies my point is content caching. By using caching we are modifying the end-to-end principle that clients will always go to the source of originating information. SP WANs use proxies and content caches to move content closer to the user to reduce overall traffic, optimize bandwidth and reduce cost, and very importantly, optimize user experience which directly relates to the value perceived and the willingness to pay.
Fully meshed topologies would be prohibitively expensive to deploy in this environment. The worst thing you can do is abstract away the actual usage pattern of the network because when you do you cannot optimize the use of content cache, distribute application PODs, or perform load balancing, and engineer time-sensitive traffic.
So I hope we all begin to see that the same service issues and topologies do not exist in the data center. One can concentrate the modeling efforts on network virtualization but this is not the same thing as a service-driven topology optimized for high quality services, which include main categories like high quality voice, HD video entertainment, HD video communications, and high performance data. One cannot just have access circuits like VLANS in the data center extend to edge routers and assume fully meshed connectivity of LSPs across the WAN between all endpoints.
Interaction at Multiple SP Perimeters
For all these reasons we cannot simply mask the complex nature of the WAN away with simple abstraction models. Guaranteeing tight and multiple SLAs require leveraging the full network intelligence across and between all layers. This is accomplished via a combination of APIs, controllers and virtualization, depending on which method works best for the specific task at hand and which method is preferred by the service provider. WANs are bandwidth constrained and employ several bandwidth optimization strategies like caching. This requires optimal service placement combined with proper traffic management based on the full understanding of the network topology and events across layers. WANs also require special technologies for fast failure recovery, because WANs respond differently than data center networks.
Cisco Open Network Environment provides full duplex APIs and programmatic access across all network layers providing service providers topology knowledge and event visibility in order to manage and guarantee multiple SLAs, trouble shoot and optimize the network in real-time. As an industry we may find the greatest thing this technology unleashes is the ability to mine the data held in the network and make it available to the huge application developer community. This new capability enables them to quickly program specific policies and innovative applications at the network perimeter that work across multiple network layers.
This truly brings the network to the applications.