Scaling NFV – The Performance Challenge
In the second part of my blog series I want to cover one of the main concerns that Services Providers are facing as they explore moving to NFV and that is performance and scalability. Common concerns I hear center around latency, throughput, queuing capabilities and security. These are valid concerns since SP’s have service level agreement (SLA’s) with the their customers which lead to penalties if performance drops below the SLA. So will a virtualized network function perform at the same level as a purpose built networking device?
There is no simple answer to this question, however I wanted to highlight some of research that is being done in this area. While it can be easy to assume that same virtualization techniques use to virtualize applications in the data center can be applied to virtual network functions there are some key challenges that need to be overcome:
- Service Providers can theoretically serve an infinite number of customers (limited by the SP’s ability to connect customers) with an unpredictable demand for network services at high throughput’s.
- Services Providers need to ensure consistent end-to-end performance across all virtual network functions
One of the key benefits of virtualization is to maximize the utilization of individual servers. However with higher utilization comes unpredictable performance and a common bottleneck is the hypervisor on top of a guest operating system like windows. An alternative is to run virtual machines directly on the server without a guest operating system (OS) – commonly referred to as a ‘bare metal hypervisor.’ In this configuration you would typically assign a single virtual machine to a single CPU core so if you are running an 8 core CPU then it can support 8 virtual machines in a ‘bare metal’ configuration. Obviously the downside is that depending on traffic patterns you might not utilize all the capacity of a server negating the benefits of virtualization.
So there are some trade-offs. More predictable performance versus under-utilizing your server capacity. This may not be an issue with a small number of servers but in an SP’s network where you have potentially thousands of servers, idle server capacity can be costly from both a capex and opex perspective. Depending on the size and architecture of the SP infrastructure moving from purpose built networking hardware like the Cisco ASR 1000 and ASR 9000 to a virtualized environment might not be cost effective.
While SP’s are still experimenting with NFV to identify best practices and find a balance between performance and utilization, the reality is that the solution for more predicable performance might lie somewhere in-between virtualized network functions and purpose built networking hardware. For SP’s that support a broad range of customers from small to medium businesses who may require a few megabits of throughput all the way up to large scale enterprises who need gigabits of throughput it is possible that NFV will be suitable for smaller customers, while larger customers who want more services at higher throughput’s may require their own dedicated network devices.
In my next blog as part of the NFV Blog series, I will discuss some of the challenges and solutions to operating and managing virtualized network functions.