Building a useable Autonomic Networking Infrastructure from the Ground Up
Yep, that’s what we did, and yes we are shipping it today!
As Michael’s blog explained, autonomics are all around us, both in feature implementation (e.g. a routing protocol like OSPF) as well as in architectural frameworks like GANA. But while the former has created isolated, per feature domains of autonomicity, the latter has never really resulted into a useable implementation used by a network engineer to date!
Lets go back to what we said out the vision of Autonomic Networking was going to be, as in the below figure, which I essentially repeated from my DON’T PANIC blog. The observant reader will notice that I changed the term ‘simple management tools’ into ‘SDN/NMS Controller across a simplified northbound interface’. After all we can’t ignore markets trends like SDN.
The vision remains the same whether you use an iPAD versus a super-duper controller though: you ingest a network wide behavior into the network, as we can model the totality of the network in an abstract, location-independent, network-wide manner. Autonomic Processes turn this network wide behavior into local state, and might invoke control loops between nodes to do this effectively. This ultimately results into the good-ole legacy network protocols to become self-managing, without changing the protocols themselves. Genius! But how do we get there in practice? And can customers trust us to do the right thing from day 1?
As Michael’s blog already alluded to and allow me to quote him: “What is the first, manageable, digestible, implementable, operable step; the bit of autonomic functionality that’s just right, which is important enough to get done, but not too complex to be scary”. We decided that the first thing we needed to do is to allow to bootstrap Domain Wide Identity i.e distribute device identities in a secure, completely zero-touch way. Why you might ask ? Well, once you have a secure identity on a box, and you are certain that all device identities are anchored to the same domain-wide trust anchor, you can do some really cool things, like:
- automatically create security associations between two nodes e.g. to set up encrypted connectivity
- Sign control packets and more importantly verify the signature at the receive side as ‘belonging to the same domain’
- Create shared keys between nodes on behalf of network protocols like OSPF or BGP who need them (e.g. for protocol authentication)
- Create Shared keys to invoke e.g. encryption.
Of course even that turned out to be quite a challenge: how can you transport these domain identities without relying on USB sticks or pre-staging ? So we peeled off the onion even more and decided that we needed to find a way of bootstrapping domain wide end-to-end connectivity, which could help us in distributing the domain identities. This end-to-end connectivity is completely zero-touch, and is encrypted hop by hop. It cannot be un-configured. It cannot be killed by Joe CCIE’s configuration mistakes. It’s always there. How cool is that!
We call it the Autonomic Control Plane or ACP, and the next figure explains a summary of what it does.
As you can see the ACP is a ‘growing’ organism. A new node, with absolutely NO configuration boots at the edge of the network (which we assume is already part of the Autonomic Control Plane) and sits there waiting.
The first thing that is happening is Channel Discovery, which has as goal to ‘find’ the Channels across which the Autonomic Protocols will establish adjacencies and connections. In this phase we support Ethernet connectivity (is there anything else?), so the Channels are essentially VLANs. The device that has already joined the ACP (the blue cloud) sends out probes to figure out the topology (Is it an E-LINE? An E-TREE? An E-LAN? Is it Port-Based? Is it VLAN-Based? Is it just a P2P Cable?) Essentially it allows the new device to figure out which VLAN-id to use to do its autonomic stuff.
As Channel Discovery is not secured, the next thing that needs to happen is authentication, and that happens through Autonomic Adjacency Discovery, an IP based protocol that leverages link-local addresses. Authentication happens by virtue of the new device offering its device identity (through its Unique Device Identifier / UDI or its IEEE 802.1ae Secure UDI). The edge device (or the proxy device as we refer it to simply transports the UDI/SUDI to a device that performs registrations for the Autonomic Domain (and is therefor called the Registrar, clever huh?!). The Registrar checks it local whitelist, grants the new device access or quarantines it, requests a Domain Certificate to be created, and sends it to new device through the proxy device.
The new device can now join the domain and it does this by generating keys from the Certificate and performing mutual authentication with the edge/proxy device. These keys will also be used to sign any subsequent Autonomic interactions between devices, for increased security.
(Side note: the registrar needs about three lines of configuration i.e. the domain name, the IP address of a Certificate Authority, and an optional whitelist configuration. It is also not vital for the operation of the ACP, as we only need it for new devices about to be bootstrapped. Multiple registrars can exist in the network for redundancy reasons).
Once the new device has received the domain identity it also creates a local , routable IP address (based on hints it received from the network and from the Registrar). Based on information it gleans from its neighbour it joins an IP overlay that follows the physical topology of the network, in other words the ACP.
In the previous drawing we have assumed that the ACP was already partially created between the proxy device and the Registrar(s). If a couple of nodes have booted already and have been cabled up this is essentially what happens as soon as you configure a single registrar on one of those nodes. Immediately all nodes L2 adjacent to the registrar will join the ACP, then the nodes L2 adjacent to those will join, essentially rippling ACP goodness through the network. We have ways of enabling Autonomic Adjacencies to be set up across a ‘dark’ L3 cloud, while Channel Discovery takes care of ‘dark’ L2 clouds. So in essence the ACP is self-forming and zero-touch.
Another very cool thing about the ACP is that it never goes down and cannot be unconfigured. It will survive a route to Null0! It will survive a AAA misconfiguration. Its like a Virtual Out Of Band Channel, although its in-band. (Hence Virtual! Get It?)
Obviously the end-goal isn’t just the ability to bootstrap identity domain-wide, or the Virtual Out Of Band Channel! The ACP can be used to do lots of cool things such as :
- Being a Transport for Service Discovery e.g. Where is my NMS/Controller/AAA, TFTP, Syslog server?
- Being a Transport for TFTP Downloads
- Being a Transport for other applications running on top, which essentially brings us back to the vision: if we deliver a Messaging Infrastructure and a neat application interface on top, we can see that the ACP will become the foundation of the vision of self-managing networks!
To conclude: how real is this? Very real! This year we are shipping the Autonomic Networking Infrastructure (Or ANI, which includes secure, zero-touch device bootstrap, ACP, automated TFTP Download and Syslog/AAA/TFTP Service Discovery) on our SP Access and Pre-Aggregation platforms (ASR901/903/ME-3600/ME-3800), with the aim of severely lowering the OpEx in Day 0 and Day 1 operations for Carrier Ethernet and IP RAN deployments! No more truck-rolls! No pre-staging! Amazing security!
This is just one of the many use-cases for Autonomic Networking! Autonomic Networking is real, rocks and its here to stay! If you want to talk to us about it, go ahead! And next week at Cisco Live in Milan, we’ll have much more to tell you, stay tuned!