Why virtual-routing may (possibly) become reality
One of the great things about the concept of cloud computing is the possiblities it creates for large, disruptive markets. We talk about compute capacity markets and the like all the time–that’s the commoditization of IT infrastructure. But, as Simon Wardley would be quick to point out, the commoditization of one technology almost always leads to the opportunity to innovate others. What are the new technologies that cloud infrastructure will enable?
An interesting discussion broke out in the “cloud-o-sphere” this week surrounding a related comment from our own Doug Gourlay at Cisco’s CScape analyst conference last week. In talking about the effects of cross-cloud workload mobility to Andreas Antonopoulos of Nemertes Research Group, Doug mentioned the concept of “Virtual-Routing” (a horrible term), the idea that one can move the compute loads to the best network location rather than rerouting the network to the workload.
Andreas’s post outlines the concept very well:
“Routing: Controlling the flow of network traffic to an optimal path between two nodes”Virtual-Routing or Anti-Routing: VMotioning nodes (servers) to optimize the flow of traffic on the network.”Using netflow information, identify those nodes (virtual servers) that have the highest traffic “affinity” from a volume perspective (or some other desired metric, like desired latency etc) and move (VMotion, XenMotion) the nodes around to re-balance the network. For example, bring the virtual servers exchanging the most traffic to hosts on the same switch or even to the same host to minimize traffic crossing multiple switches. Create a whole-data-center mapping of traffic flows, solve for least switch hops per flow and re-map all the servers in the data center to optimize network traffic.”There’s a startup there somewhere. Route the node, not the packet.”
I think there is a lot to this concept, but I would couch it not in terms of routing per se, but int terms of data center optimization in general. If latency is reduced by moving a workload from one server to another (and it’s not a short term thing), than by all means do it. If you can optimize over time, all the better.
Now, the concept of Virtual-Routing is not without its detractors. Two members of the f5 DevCentral crew commented on the concept, for example. Lori MacVittie (a MUST follow for Infrastructure 2.0 fans) made the case that “just because you can, doesn’t mean you should“, arguing:
“Optimizing network traffic is a good thing, but not when that optimization might move nodes (virtual servers) to servers which will negatively affect the performance of the application while improving network-oriented performance. Sure, moving a virtual image of an application from one node to another may decrease hops per flow and associated latency, but the assumption is that all compute resources (hardware) is created equal and that the resource is capable of providing the same level of application performance as its previous hardware. It also assumes that the performance of the application being moved (because that’s what’s it all about) will not be adversely affected by a move to a compute resource which is already limited in its capacity by the fact that it’s serving up other applications.”Hoff gets it right when he mentions context, as there are many more variables to application performance than just network optimizations such as number of hops and bandwidth. There’s also this little matter of server affinity that isn’t well addressed by moving virtual servers around while they’re serving up applications because server affinity (persistence) needs to be handled by an application-aware infrastructure, of which routers and switches are not generally not. “
OK, well, first the latter point is about to change drastically. It has to. When we talk about Infrasructure 2.0, more and more contexts have to work together. Or, the higher level contexts must be decoupled from the lower level contexts. Either way, this work is happening, and I think it is fair to spend some time imagining a world in which addressing, policies like compute and security requirements, and distributed dependency management actually enables workload portability for any reasonable requirement, or even set of requirements.
The first point is absolutely true, but it shows that Lori assumed that the only policy applied in this scenario would be network optimization. That would be rediculous, and let me go on record as saying that workload migration without considering the needs of the workload is not advised…by anyone. However, as apart of an overall scheme, in which the needs of the application payload is balanced against the needs of the data infrastructure against the needs of the network, it makes sense to add the consideration that moving the workload may make more sense in some cases than rerouting network traffic. For instance, if you move the presentation layer of a traditional web application, it may make sense to move the business logic tier with it, primarily to address network latency. Rerouting traffic to the business tier is an option, but perhaps not optimal in this case.
The other post came from Alan Murphy, and focuses specifically on the VM management concequences:
“This doesn’t resonate with me on many levels. First off, the basic principle of moving a server to match the traffic rather than managing traffic to the server. Traffic is smaller and easier to manage than servers, even with virtual servers. Managing constant VM movement with something like vCenter, clusters, resource pools, etc, is a surmountable task. Second, the process of moving the VMs themselves will have a cost wrt management, downtime, bandwidth (in the event Storage vMotion is used to follow the traffic). As an example, a quote from the post: “VMotioning nodes (servers) to optimize the flow of traffic on the network”Hmmm-How will moving a server optimize the traffic flow to that server? Again back to our gas station example: moving the gas station won’t change the traffic patterns at all. And are we talking LAN or WAN? If we’re talking LAN, then there should be traffic management devices, like BIG-IP LTM, in place already to manage optimized access to the apps running on the VMs. If you’re talking WAN, moving VMs cross-WAN on the fly has its own set of issues with traffic and storage.”
Sure, today. However, the assumptions that “past performance indicates future results” are frequently flawed, and they are in this case as well. I concede that making virtual-routing a common practice will require rethinking some software and network architecture principles, but I think it is far from an option to take off the table at this point. I also completely understand why Alan is warning against the practice now, but I’m betting that policy-based workload routing on all kinds of metrics becomes mainstream in several years time.
Of course, virtual-routing and other workload mobility scenarios mean that certain infrastructure services like load balancers, SSL processors, etc., must be portable with the workload. So its not going to happen now, or tomorrow, or maybe even a year from now. But I would counter argue that the technology is desirable in many situations if it can be made to work, so we should put some time in to research the possiblities.