I am going to spend the next couple of posts digging through one of the more interesting new technologies we are working on: a standard called Locator/ID Separation Protocol (or LISP). Why should you care—well if you are looking at deploying clouds, supporting mobility of end-points or VMs or are managing a routing architecture or any meaningful size or complexity, I think it will be worth your while to check out LISP.
LISP is a new approach to routing that is designed to address the changes in how we are using our networks. Lets explore LISP through the lens of one of the biggest challenges facing network architects today: properly tackling mobility, whether its mobile endpoints like smartphones, tablets or squirrels or the mobile workloads that are at the heart of server virtualization and cloud computing. While mobility this is probably the “sexiest” use case right now, there are a number of other use cases, like routing architecture scalability and IPv6 migration, which, while less alluring to all but the biggest networking nerds, are no less important.
So, what is my locator and my ID and why do they need to be separated all of a sudden? Let’s use mobile phones as an example since the analogy seems to click for most folks I talk to. Looking at Cisco’s main number +1.408.526.4000, we see the number implicitly contains location info: the “1” means its in the US, the “408” means its in the San Jose area, before number portability, the “526” would tie the number to a specific central office (trivia: in the old days, it tied the number to a specific switch in the CO). When I needed to route a call to +1.916.555.1212 it was relatively trivial to figure out the call needed to go to a specific CO in northern California and send it on its way. This all worked when phones sat on tables or were bolted to walls, but then came mobile phones. Suddenly, although my mobile phone has a 916 area code, I could be anywhere on the planet and the phone network needed to be able to quickly and efficiently route calls to me while I was on the go. While my phone number still uniquely identifies me (or at least my phone), the intrinsic location information was no longer useful for actually finding me.
We have a similar situation with IP addressing. Most of you know that if you look at an IP address like 192.168.1.1, some portion represents (say “192.168” for simplicity’s sake) represents a specific network and the balance (“1.1”) represents the address of a specific host. Many of you also know that a lot of time and effort goes into efficient allocation of addresses and of designing routing architecture. A branch office might have a specific set of subnets allocated to it—similarly and rack in a data center might be in a specific subnet. Unfortunately, much like the advent the mobile phone broke the model, the introduction of client mobility and workload mobility blew a hole in the traditional approach to routing.
In the data center, much of the focus has been around maintaining L2 adjacency to enable virtual workload mobility. Certainly, stretching L2 is a key part of any comprehensive solution and we have our own solutions in this space, namely VXLAN and OTV, but they have their limits and at some point L3 needs to enter into the equation, which is where a technology like LISP comes into play.
If you look at VXLAN as an example, while the technology allows you great freedom on where you can move your VMs, the L3 default gateway (i.e. the connection to the rest of the world) stays pinned to the original location. With VM migration within a data center, this is not really all that big a deal since the route to the outside world is probably the same for the entire data center. If you are looking are moving between data centers, it may make less sense. Say you are in LA and accessing a data center in San Francisco. Now, being its summer and there is a threat of rolling brown-outs so your workload gets live-migrated to a data center in San Jose. Even though your workload is now in SJ, your path to the outside world will still flow through SF. Minimally, you are looking at less efficient traffic flow (LA (you) > SF (default gateway) > SJ (workload) > SF (default gateway) > LA (you) but if SF actually goes down, you have totally lost connectivity that could cause varying degrees of grief. Ideally, you would want your workload-on-the-go to dynamically map to the nearest or optimal default gateway. We have a couple of ways of adding L3 intelligence into this discussion--the recently announced Cloud Services Router 1000V is one solution; however, in this case LISP would probably do the trick.
So, how does this magic all work? At its heart, LISP creates an extended address space that (surprise!) separates the host ID from its location and allows a “map and encap” routing scheme. The “address” looks something like 2001:0102:0304:0506:0708:0900:0a0b:0c0d:2001:0a0b:0c0d:0e0f:1111:2222:3333:4444 (IPv6) or 10.10.1.1.192.168.1.13 (IPv4). The blue portion is called the Endpoint Identifier (EID) and represents the IP address of a node. The red portion is called the Routing Locator (RLOC), which is the IP address of the LISP router for that host. LISP addressing is actually much more flexible than these two examples and the actual header format looks is a bit more involved, but let’s not complicate things for the purpose of this post--I’ll dig into the specifics with the next post.
As noted earlier, LISP uses a “map-and-encap” scheme. The “map” part uses pull a mechanism very similar to DNS. With DNS, you may query a DNS server for a given name (“hey, where is blogs.cisco.com”) you get back an IP address (“you can find that at 22.214.171.124”). With LISP you can query a LISP mapping server that maintains RLOC/EID bindings for the location of a host or EID (“where can I find host w.x.y.z”) and get back the address of the router or RLOC serving that host (“you need to go chat to router a.b.c.d”). With this information, your local router can now encapsulate and send traffic to the remote router which can then ultimately deliver it to the destination host. With this mechanism you can now unteather both you mobile endpoints and you virtual workloads without stuff breaking. But that is just scratching at the surface: I can take the same concept, but move entire subnets at one time. With my DR provider, I could bring up an entire data center at my DR site and simply change the mapping in the LISP mapping server to seamlessly re-route all traffic to the new location—again, more on the use cases in the third post.
So, this is the 60 second overview--I’ll dig deeper into the underlying mechanics of LISP in the next post, but for now, let me wrap with a couple of benefits of the LISP approach:
- It’s a network-based solution, so there is no tweaking of hosts needed
- There are minimal config changes on the routers, in fact, it is fairly prosaic in its design and implementation (i.e. uses pull mechanism similar to DNS)
- LISP is address space agnostic
UPDATE: You can find Part 2 here: http://blogs.cisco.com/datacenter/why-youll-want-lisp-routing-part-2/