Preparing to Automate a New Data Center for DevNet Sandbox
You don’t need to look hard to see that DevNet is growing, and as DevNet grows, DevNet Sandbox must grow. DevNet Sandbox provides engineers and developers access to Cisco, partner, and open source infrastructure, and software as a service from “the cloud.” For a variety of reasons, the Software as a Service (SaaS) that is Sandbox is delivered primarily from a private cloud that is fully designed, built, and operated by the Sandbox engineering team. And of course, this means data centers – multiple globally distributed data centers.
Building a brand new data center for Sandbox
As part of our service planning for the future, it was clear that we were going to need to expand our data center capacity, and do so quickly. And that brings us to the current project that I am working on with our team … building a brand new data center for Sandbox. One that will offer increased capacity, reliability, and performance for all of our users.
If you’ve been following me at all, you know that I joined the Sandbox team within DevNet to see how network automation and NetDevOps concepts, technologies, and approaches can be adopted into a production grade environment. At Cisco Live 2019 in San Diego, I delivered a session Sandbox as Code – DevNet Sandbox goes all-in with NetDevOps where I ran through the history of Sandbox and how the network and infrastructure evolved to it’s current state.
The session also described the goals we’d laid out for the future of our infrastructure – it’s physical and logical architecture, as well as the concepts and tooling we plan to use as part of our automation driven method for operation. This journey and evolution has been ongoing and we’ve seen plenty of improvements and learned many things already, but the opportunity to build a new data center infrastructure from the ground up is one that offers us an unexpected chance to “automate all the things” from the very beginning.
Like any project in IT, this data center build has a tight timeline, and many external factors that must be considered. This means that we don’t have unlimited time, funds, or engineering resources to leverage. Our leadership is very supportive of using this opportunity to rapidly jump ahead in our automation journey. I am truly grateful for their support and commitment to innovation, but we still need to be very diligent in choosing what and how we move forward. Sandbox leadership’s phrasing was something like:
“open to innovation and risk up to the point of panic – but not past it”
As we start this journey (and all along it), I want to share how we are tackling the project. And first up…
Proving out a new network design and automation toolchain in “Test”
Our data center network isn’t all that more complex than many other enterprises I’ve worked with. We have an internal “production” environment, a management network, a DMZ, and our connection to the Internet. The one bit that makes us unique is that we have hundreds of isolated “pod” networks where individual sandbox labs are placed to provide customers secure access to resources. From a logical network design, it works well for us, and there is no significant changes planned. But the physical network underneath is something that we’ve long seen as needing updating (see my Cisco Live presentation for details). Along with changes to the hardware, software, and features in play, the level of network automation and NetDevOps concepts is going to be significant.
Some of the changes we are looking at include:
- Standardizing on Nexus 9000 hardware platform and NX-OS 9.x
- Deploying a fully redundant access, distribution, and border topology
- Moving to Virtual ASA Firewalls per pod (from current context based model)
- Source of Truth (aka “intent”) driven network configuration and operational tests
Our new data center’s home will be a cage at a colocation partner of Cisco’s, and the build of the cage, racks, power, etc will take several weeks to complete before we can begin racking equipment and powering up the network and infrastructure. We cannot wait for the environment to be ready to begin working on the network configuration templates, automation strategy, network ready for use test development, or any of the other elements of NetDevOps design and engineering that will be required, there simply won’t be time. By the time we power up the network, we need to have all the elements nearly 100% worked out because the project plan has us immediately rolling into standing up systems and applications within a matter of weeks.
What we need is a test environment that simulates the planned production environment as close as possible – a “sandbox” you might even say…
NetDevOps Inception: DevNet Sandbox provides a sandbox for DevNet Sandbox
It should come as a surprise to no-one familiar with me that I immediately thought about Cisco VIRL as a way to build out our test environment. I’ve long been a proponent of using VIRL as part of network automation work, and I quickly reserved an instance of the Multi-IOS Cisco Test Network Sandbox available in our catalog. Within minutes I was able to begin work on a network simulation that matched what our production topology would be in the new data center.
VIRL is a great tool, and offers a lot of options for network simulations, but it is not a physical network and there will always be adjustments you need to make. The goal is to build a test environment that will allow engineers and developers to verify the key parts of the design and automation elements to a degree where the confidence going into production deployment is high. Our simulated network will offer us that with ease. Some of the key areas where the simulation does differ from production are:
- Our production environment leverages an x86 virtualization platform that would be impractical to replicate in VIRL. The virtual switching layer is therefore replaced with L2 IOSv switches. The configuration and management of the production virtual switches is already a well known element and will be easily addressed in production.
- Rather than build hundreds of sandbox “pods” in the simulation, a single pod is included. The configuration of that pod will be cloned out to all pods in production.
- Actual interfaces used for connections will differ in production and simulation (ie Ethernet 1/1 in VIRL maybe Ethernet 1/49 in production). This means that all network configuration of interfaces will be determined by role and neighbor device, rather than by interface id.
- Production has a separate physical management network that is not represented in the simulation. VIRL provides it’s own management network. This will result in some slight management routing differences that will need to be addressed between the environments.
While VIRL provides support for a wide variety of components within topologies, not everything can be included in VIRL today. A key part of our network that isn’t available (yet) is our physical server platform, Cisco UCS. The fabric interconnects are a big part of the network, and UCS Manager manages the VLANs and vNIC configuration within the server environment. Leaving these out of our network configuration and automation testing would be a major oversight. Luckily DevNet Sandbox offers several UCS Sandboxes, including a UCS Management lab that includes 2 UCS emulators and several other tools. I merged an instance of this lab into the running Multi-IOS Sandbox to add the UCS Manager Emulators into our test environment. This is not something currently available in our public labs, but it is definitely an enhancement we will look at for the future.
Show me the Tools! NetDevOps Tools that Is
Having the key elements of our physical infrastructure available in a simulated test environment was a big step, but as I mentioned our plan is to put our NetDevOps and automation plans and tools to the test in this new data center. The key tools in our toolchain for network configuration and operations are:
- GitLab – Source Control, Container Registry, and Build Server
- Netbox – Data Center Infrastructure and IP address Management (DCIM/IPAM) Source of Truth
- Cisco NSO – Network Device and Service Configuration Management
- pyATS/Genie – Network Operational State Validation
Bringing these tools into the test environment was really no trouble at all. Connecting to a reserved sandbox is done through VPN access from your workstation. And sense my workstation already has access to our GitLab and Netbox servers, they are available to me when I’m working in the test network. I did need to populate them with data and information about the test network, but I’ll talk more about that in another blog post. And as for Cisco NSO and pyATS/Genie, these tools are designed to easily run on any Linux or macOS system, and I already run them right on my laptop. If I needed to (or wanted to) run them within the environment, that would be simple as well. The Multi-IOS Sandbox includes Cisco NSO and pyATS in the standard image, several sample labs and demos that use this sandbox make use of this tools.
Now comes the fun part!
And now that we have our test environment ready to go, the fun part comes. Actually putting together the network configuration templates, automation scripts, services, validation tests, and more that need to be ready to be deployed onto the physical network when we rack, cable and power it up. That date will come very soon, so I better get to work on those parts. I’ll definitely be taking time to share how the progress is going, and maybe even give a demo or two of how things are working. Is there something specific you’re interested in seeing work? Or a question on our design and choices? Be sure to let me know in the comments here or over on Twitter @hfpreston or LinkedIn @hpreston.
In the meantime, be sure to go reserve a Sandbox of your own for staging a test network for yourself!