Cisco Blogs

To L2 or not to L2, that is the Question for VMotion:

February 17, 2010 - 2 Comments

This is a favorite question of mine (and a very contemporary topic), albeit asked in a different way;  Why do you need layer 2 for VMotion.  Those bearing the email signature of VCP, will be quick to show off their shiny new badge and tell you it is because of VMotion, because of a port group label, or some other VMware specific component.  In fact, there is a more fundamental reason.  It is called an ARP cache, IP Address, Default Gateway and DNS.  

Remember how hosts find each other – with IP a host looks to the IP address for the destination host, but remember we are speaking about L2 or to make it simple for this explanation, Ethernet.  This is accomplished by using the address resolution protocol (ARP) to bind a destination IP address with a destination MAC (because Ethernet is Ethernet and NOT IP – Pet peeve alert when someone refers to Ethernet as an IP network, but I digress).  Of course, the source host compares the destination IP with its own IP and mask, which determines whether you send the request to the default gateway – which in turn will proxy ARP for the destination host with its own MAC address, or to ARP on the local subnet.  To avoid a constant broadcast storm, clients cache this information for a period of time with the assumption that not much or nothing will change. 

VMotion is a VMware technology for servers.  Servers are accessed in many ways depending on the type of server, but for now let’s just call the source the initiator and the destination the target.  There are a couple of things you want to accomplish when looking at these technologies as it relates to traffic.  Architects need to view traffic as egress and ingress.

  1. Maintain connectivity – assuming live migration (hence the MAC / IP address issue)
  2. Optimize exit from the local subnet towards the client
  3. Avoid trombone effect on traffic
  4. Optimized ingress from client towards target

Number 1 is obvious, but the others may not.  Optimize exit and avoiding trombone effects of traffic reference how the target responds to the client via its default gateway.  Sine this is a MAC function, and that function is in each devices ARP-cache, if a device moves to another DC over layer 2, then a potential trombone like effect could happen between the target and the default gateway.  Enter Cisco’s Overlay Transport Virtualization.  This is a mechanism that provides MAC routing, ARP traffic controls, and allows the FHRP operation to allow an Active / Active default gateway function so the moved target will egress via the local default gateway.  There are many other benefits to this technology, see for more information.

Number 4 is about how the moved target is visible from outside the subnet.  This is a function of routing.  Routers generally forward to the longest prefix in the route table.  Summarization is generally done at class boundaries, or at other mask lengths, to reduce the size of the routing table.  You could have interesting routing patterns for that target based on it moving, if the network did not see a longer path to the subnet or host in the routing table.  This can be done with RHI or mechanisms that advertise a /32 for that host in the route table.

LISP is another protocol that has the ability to work in this environment, but it adds a few more functions.  LISP has benefits to the global routing space, as well as the local enterprise, with the basic function of separating locator and end identifier.  The key here is that a device can keep its IP address and move anywhere in the infrastructure, without having to readdress or worry about L2 or L3 boundaries.  This is done via routing to the locator, versus the end identifier, and then advertising a longer prefix for the EID to address optimal traffic patterns.

All that being said, you may or may not have to worry about L2 for VMotion depending upon your implementation.

That means that if you share my vintage, and know the name Radia Perlman and her famous poem, it is appropriate to provide an update – so here goes:

I am fortunate to be around long enough to see
A multi-path more beneficial than a tree
A path whose crucial property
Is all active bandwidth via ECMP
A path which must be sure to enable
Loop free paths and small MAC tables
First the architect needs to see
That MAC routing can happen at L2 and L3
When VM’s dance from cloud to cloud
Traffic storms and table overload cannot be allowed
If vendor innovations give you a shrill
Let me introduce you to the working groups addressing LISP and TRILL

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.


  1. Or, instead of going down and extending L2, you go up and decouple the network completely. Decoupling things is the nature of virtualization and we need to extend that fully to the network. You can have a agile full proxy which will respond to the newly live migrated machine's reverse ARP (which happens when it hits it new home) and allow it to think it is fine on whatever wire it hits. Only the tombstone effect, as you described it, would require any bridging to occur for the duration of the connections that were established before the vMotion completed.A smart proxy will encapsulate just those frames associated with established connections to a peer proxy allowing for continuous traffic, while maintaining the needed decoupling so we don't have to extend L2 for anything else. This is not only an isolation technique, but also a protection as only established connection flows work this way. If any ARP poisoning, intentional of just some proxy-ARP problems, would be completely isolated from the WAN still. Proxies are VERY forgiving things. If your proxies are really cool, you installed them so they could also do variable rate compression and de-duplication to assist ESX with the transfer of the VMDK and the host state (all encrypted), thus lowering the amount of bandwidth required for vMotion to what most of the world calls a WAN, not 0C12 range of speeds. Most of the VMDK can be de-duplicated, making the whole process work much faster. All new connections to the migrated VM can be sent to the right data center or pod through integrated GSLB. The same techniques which are used by the largest companies in the world to host multi-data center solutions today. Any straggling new connections sent to the first data center's proxy by mistake, don't need to be bridged, but simply need to be directed through a non-bridged optimized tunnel between the peer proxies to the VM in its new home. All directed by business logic defined in their BC/DR plan. Setting the VMS free from the network also means successfully decoupling the client experience from the server. This means we can perform optimizations (say special TCP tunning for the AT&T iPads out there), content manipulation, and security service too. We have these wonder proxies working at speeds up to 72Gbps in production. They won VMware's innovator of the year for 2011 award. There is a movie on you tube showing a vMotion on a live host with a continuous download running. All this through T3 bandwidth. It includes VMWare orchestrator, something all VMWare customers own already, workflows driving the whole process. It is deployed and working in real life. Better living through proxies!

  2. Thanks for a great overview of the current issues facing us in the world of ever expanding L2 networks and a glimpse of some future solutions.My organisation is pretty switched on to the dangers of MAC explosion but it strikes me that there is another potential issue - an explosion in the number of ARP entries a L3 border device needs to keep to prevent it continually churning its cache and filling the network with ARPs.Are there any solutions to this issue?Alex