This blog is co-authored by Ioannis Theodoridis and Katerina Dardoufa.
The text posted in this article is not in any way representative or binding for their organization. Some details are purposefully omitted for confidentiality reasons.
Ioannis is a network and systems engineer specializing in fault, performance, and service level monitoring for the Network and IT services. He works in the Network Department, in the IT Division of the Bank of Greece, as a member of the NOC team. He holds the CCNP Enterprise and DevNet Associate and Specialist Certifications (DEVASC, EN-AUTO). He is a member of the DevNet 500 and the DevNet Class of 2020.
Katerina is a network engineer, working as part of the NOC team in the IT Division of Bank of Greece. Besides pure networking aspects (design/implementation), she has a very strong knowledge on fault and performance monitoring, and is extremely passionate about troubleshooting weird security and networking problems. She holds the CCNP, CCDP and CCNA security certifications and is an active DevNet member.
Our last blog post (“Using Automation for a DataCenter Network Core Migration”), was one year ago. This year, while we were attending the Cisco Live Europe conference in Barcelona, we were asked to do a follow up blog describing what we have been doing since. So many things had occurred since last year.
“Make time to save time”
You need to invest time in automation, to eventually save time with it. One of our biggest time hungry tasks was maintaining our open source based network performance monitoring infrastructure. We had to change our monitoring toolset entirely for a set of automation capable tools that would be ready for the next step in our monitoring, when that would become possible: Grafana, InfluxDB, Telegraf. To minimize the need for maintenance tasks, the whole platform would run on Docker.
A lot of first steps there, but the result was rewarding and automation friendly, while at the same time offering the advantages of Infrastructure as Code. In addition, the data itself was now available for processing and manipulation. If you want to read about how it was done you can read these two blog posts: part 1 and part 2. Also, here is how we used that flexibility during lockdown for creating VPN Users graph report using python code.
“If you build it, they will come.”
At that time, we also discovered Netbox. My idea and the center piece of one of our bigger tasks was that there are certain points of entry for information in a monitoring infrastructure, where a new device is picked up automatically. For Cisco network equipment, and our own network those come down to two specific points:
- Cisco Prime Infrastructure
- Our NMS system
We decided we would write code to automatically and continuously compare our active network inventory (NMS/Cisco PI) with our intended one (Netbox). This would allow us to:
- Automatically onboard devices to Netbox if they are not there,
- Identify differences in the current network status compared to the original network design and get alerts about it.
Back to the beginning
Changes and Triggers
The announcement made in June 2019 during Cisco Live US, about DevNet Certifications caused the network world to shift into a new phase. I started studying along the areas defined by this change as “Enterprise Automation”. That included Model Driven Programmability and Telemetry, Netconf/Restconf/Yang, Yaml/Jason/XML, Rest APIs, DevOps principles and tools, etc. Cisco DevNet offered learning labs/tracks and a significant Sandbox Infrastucture, which made the trip fascinating.
Cisco Live Europe Barcelona 2020 was full throttle on DevNet. So many sessions in parallel and so many workshops. I really enjoyed meeting the DevNet and PyATS teams. Discussing with Katerina about her thoughts and impressions, I was happy to know she was content about how aware we were about current developments in network automation, but also about our skill set and the efforts we are making to integrate it in administration and monitoring.
From Individual Growth to Working as a Team
The DevNet experience up to that point had been about the journey of the individual. It’s a powerful message, pushing you to a big culture change, meant to set you on that path, where you rediscover your abilities, your relationship to basic development skills, which you can reapply to learn new and creative ways to leverage your network, your network applications and software.
Motivation and energy
Katerina, using that energy that came from her clear view of her own goals and abilities but also sharing a common vision, picked up speed. Responding to all my challenges for new ideas without hesitation, and defining her own goals, she developed the following:
- A python script that would get a list of devices with ip, hostname, type, series, model etc from Prime Infrastructure using the REST API, so that we could use it to populate Netbox with those devices. You can find it in github, under DeviceInfo_from_PI and it has been accepted and published in the Cisco DevNet Code Exchange.
- A different python script that would get a full list of ip addresses from every device managed in Cisco Prime Infrastructure so we would later fill them in for each device already integrated in Netbox. It’s also available on github under CollectIP, also published in the Cisco DevNet Code Exchange.
- She studied Nornir using the documentation, our own efforts and the excellent videos created by IPvZero and created a script that connects in parallel to all our branch routers, searching the routing table for a specific route, checking if the provider router is on its main uplink or the backup one. The information is received in seconds via email. The script is available on github under Nornir_Check_Route
I asked Katerina, “what would you say if someone asked: What drives you personally to automate, make scripts, attack the next target, and carry on?” She responded:
“I have always considered myself an old-school network engineer… I am very fond of routers/switches/protocols and the like. Thing is, that I was seeing network programmability showing up everywhere. I was starting to feel left behind. Luckily my friend and colleague kept bombarding me with information on the subject, so to be frank he pushed me, mentored me and got me involved. Since then I have come to appreciate automation and scripting. It needs some effort to get things started, but the speed and the accuracy of the end result is worth it.
What I really enjoy about automation and what motivates me (besides the constant “nagging and pushing” of my workmate) is figuring out how to use it to address use cases in our own work environment. The thrill of doing something new, the thrill of troubleshooting and trying to figure out what went wrong, until you achieve your goal! These are things that I, as a network engineer, really enjoy!!!! On the other side, I have been using enterprise network tools for quite a while and I was a bit reluctant to give them up for other frameworks and automation. What I realized is that via using automation and scripts I wasn’t giving anything up. Instead, I was able to make even better use of these tools and in the long run this has come to save me time to deal with the other network stuff I like!”
As a team, we are more than our sum
One would say “Great! He/She is on board! We are a team!” Well, no. While that is great news, it doesn’t make you a development team.
As network engineers, we can’t split the CLI. One person takes over at a time, and issues commands. We can discuss about configuration and design all we want, make decisions together, but at the end, only one person will do the necessary configuration on a single device at the same time, at any given moment. Also there are limited ways you can get to the desired result and when you do, that is a state.
Development is different. The same project, even the same script, can be cut in tiny pieces that can be assigned to each team member. They can develop code in parallel, as long as they agree on certain principles and specifications. That doesn’t come easy with network engineers as we are more accustomed to doing things in our own corner, so we tend to develop automation projects in the same way.
Last February, I decided to take the DevNet Associate exam. The exam requires you to study content meant to give you a good taste of the tools and methodologies you need as a developer, to be a functional and successful member of a development team.
Here’s a short story to illustrate this point:
I wanted to turn the whole network status into a status variable with PyATS and watch over it with an alerting tool, like Nagios for example.
We developed a Nagios plugin with python code that would get a “golden” configuration for a network device, store it, then when run again, would check the current running configuration against the golden. If there were differences, it would create a critical alert! If not, all is OK! We would run Genie commands in pure python instead of CLI, and call a python script as an argument to the PyATS docker container. The PyATS team helped us with some directions.
As I was preparing to write the code myself, my partner beat me to it: “hey I wrote the code for PyATS operations, here it is, it works!”
My first reaction was shock. I was glad of course, but also thought “hey, I was going to do that!” And then it hit me. This is what the team is about. It’s not about “Hero culture.” We were co-developing in the same code, bringing it all together. It was raw and uncut of course. But that is the real challenge. As a team, we are more than our sum. With the right amount of time and resources, we can practically build anything (or at least anything we need) when we work together. The real challenge we need to rise to as a development team, is how to take advantage of our full potential.
The code and documentation for the Nagios plugin is published on github here and on the Cisco Devnet Code Exchange here. Right before that, based on the original code Katerina created to get the list of devices from Prime Infrastructure, I created a python script that can log in Prime, get the list of devices and create PyATS testbeds per device location, so you can use them for your PyATS test scenarios. If it’s managed by Prime, you can get a testbed with it. The code is published in github here and in the Cisco DevNet Code Exchange here.
Epilogue – Future plans
This is an anniversary post, one year after our first one in the cisco developer blog. Last year it was about individual motive and inspiration. This year it’s about team potential and growth. If you are new at this, you are not alone, the beacons are lit, the community is growing. Read fellow champion Daniel Dib’s post about “Getting Over My Fear of Network Automation” to help yourself find your own motive and inspiration.
If you have travelled this path already and have found the light, time to go back and get some more people with you, and build a team. The experience will be rewarding for all of you.
In the next period of time we plan to fulfil our plans to fully leverage Netbox’s potential through python scripts so we can keep track of device changes in the network, provide extra capabilities via custom python scripts launched from Netbox UI for our collaborating teams (e.g. Helpdesk) to provide added value with functions not already available with our set of traditional monitoring tools. We also plan to use Cisco Prime Infrastructure as a dynamic inventory for active network devices for both PyATS operations and Nornir tasks and leverage our NMS API to get programmatic access to our network alerts engine.
Finally, we intend to give Streaming Telemetry a spin as soon as time allows, as the Netconf plugin for Telegraf is almost out! To help our team grow, we plan to use tools and methods to work together efficiently. Maybe develop gitlab internally, as a start, and create a common developer environment based on docker containers, to get out of our WSL boxes and closer to production. We‘ll see you all next year 😉