Some Real World Network Automation Lessons Learned
Eight months ago I joined the DevNet Sandbox Engineering and Operations team. For the first time in about eight years, I was back in a role where I had day to day responsibilities for a production environment. A big reason for my interest in joining this team was to see what it was like to apply the network automation and cultural transformation ideas that are part of NetDevOps in “the real world.” I’d long explored the concepts in theory, discussions, demos and lab environments, now it was the time to put that theory into practice. You can read more about the motivations and ideas for the change in the blog post I made shortly after taking the new role. This time around I’d like to share some of what I’ve learned so far.
Finding Time to Innovate Is Hard / Inertia is Easy
A common phrase I’ve heard in many automation classes and workshops is something along the lines of “but I could have done that faster through CLI.” This often comes after we finish going through one of the exercises I commonly call the “hello world” of network. Exercises like adding a new loopback or VLAN to a switch. My answer is always the same, yes you could have, but that isn’t the point.” It’s when you scale up the concepts across larger networks and more complex network services where you’ll see time savings. This idea of “I could have done it faster the old way” exists during a large transformation as well.
There is no doubt that any project that comes your way could be solved with “traditional” methodologies and without automation. Odds are you’ve solved similar projects in the past. The first few times you tackle a project using automation and NetDevOps tools and methodologies it will take longer, sometimes significantly longer. You, your team, and your leadership need to know and be prepared for this fact, building in extra time transformation. If you don’t build in the time, I guarantee there will come a point in the project where you will be tempted (or maybe pressured) to just “do it the old way”. But if the time is invested in these early projects, there will come a point when these projects are now completed in a fraction (sometimes a very small fraction) of time.
I’ve known the above for a long time and certainly before I made the change to Sandbox engineering team. What I learned (or at least REALLY learned) once I was on the team was that not every project and task should be immediately given the transformation/automation treatment. If you try, you will overwhelm your resources very quickly and progress will grind to a halt and the transformation will be at risk of failure and abandonment. Care must be given to projects as they come in to decide whether they are good candidates for automation. At the start, the percentage of projects given “the treatment” should be small. Once you have some successes (and failures) under your belt, a larger and larger percentage of projects can be transformed at a single time.
Today’s Requirements Come First
When you run a system that has live users, integrates with other systems, and is part of a product there will always be demands on your time that may not fit into your transformational project. Despite this fact, and as frustrating as it may be, they must be dealt with immediately and before anything else.
Two examples that hit us in Sandbox include:
* High priority external requests *
One of the key parts of a transformation is managing the time and resources of the team so that time can be invested in the transformation. This often means delaying or saying “no” to new requests on the team. But sometimes a request comes in that has a high priority for the wider company and stakeholders and will take precedence over your team’s transformation. An example of this type of request on our team is new product launches at Cisco.
When a product like Cisco DNA Center is released or updated, ensuring the developer resources are ready at launch time becomes is critical for it to be successful. This means that across DevNet teams are crafting new sites, learning labs, code examples, and Sandboxes – and that work is highly visible all the way up to our senior leadership.
* Fixing bugs and outages *
Not all bugs require immediate attention, but ones that affect the usability of the Sandbox platform do. When something like that is found all other work is put on hold. It can be frustrating to have to stop working on some fun new innovative project to go back to an older project or system, but when other people rely on your work there’s not much else you can do. And this is where I’ll point out that thorough and automated testing can help you discover and fix bugs before other people stumble across them.
And similar to bugs, are outages. And yes, even in “the cloud” outages happen… The types outages we have to deal with in Sandbox include utility level outages like network carrier and power, infrastructure outages from hardware or software failures, and application outages from a database to web servers. And while not all outages we experience may trickle to our users noticing them, it still takes time for our engineers to monitor and resolve them.
One little trick I try to do when a high priority requirement comes up is to look for ways where automation and programmability can be used to deliver that project better and faster.
Fundamentals are Fundamental
I don’t think that anyone would argue that things like DNS, DHCP, AAA, or NTP are not important in a network, but I also know that we don’t always have the most rock-solid and pervasive deployments of these supporting services across our environments. What about accurate documentation about how the network is intended to be configured? Many networks are maintained through a bread trail of partially accurate network diagrams, questionably accurate spreadsheets, and “suggested” network configuration templates. Oh and standards… be they hardware, software, cabling, configuration, etc. When push comes to shove, over time these standards are often given less respect than the “guideline” status of the Pirate Code.
Our networks are often highly manually curated environments, where the experience and history of the network engineer with the network guide projects and day to day activity. However transforming to an engineering and operations world where the network configuration and operational health will be maintained and ensured through automation, it is an absolute requirement that all these fundamentals are fully in place and respected.
This means that time must be taken to set up this foundation and make using it part of the way your team works. If you don’t you’ll forever exist in a world where “automation won’t work for us”, and it’ll be your own fault.
This is no means a complete and comprehensive list of what I’ve learned so far, but something else I’ve learned is that finding time to write blog posts is tougher when you have a network to operate — so I’ll leave it at this list for now. I am going to endeavor to pull my head out of the code more often to take a breath and share what’s been going on, and what new things I’ve learned. In the meantime, I’d love to hear from what you’ve learned in your own transformation projects. Let me know in the comments, or over on social at Twitter @hfpreston or LinkedIn.
And of course, be sure to check out the great resources here on DevNet that can help you!
- DevNet Sandbox
- Curated list of learning content: StartNow
- Unleash the capabilities of the new network with DevNet Certifications
- Webinar Series: NetDevOps Live!
- Sandbox as Code DevNet Sandbox goes all-in with NetDevOps Cisco Live On-Demand Video