In collaboration with Ragupathi Rajavel, Information Security Engineer 

Sometimes you can say, “I’ve worked on a solution that everyone interacts with.” For the Network access team at Cisco, this statement describes working on the Identity Services Engine (ISE) solution. ISE has been everywhere for some time now, enabling authorized access for our users for Wired, Wireless, and/or Virtual Private Network (VPN) topologies.

Our deployment is made up of 20+ nodes across seven data centers and currently remains on premise, as depicted below.global deployment of 20+ nodes across seven data centers

The deployment is also supported by load-balancers, with Policy-Service Nodes (PSNs) representing most of our compute footprint.

In the first half of 2022, we began to evaluate an upgrade to ISE 3.1. As part of our staging efforts, we quickly realized the enhanced API richness available for automation and configuration management. Continuing, we also planned for ISE 3.1 to be part of necessary plans to support our enhanced Zero Trust efforts. These efforts include trusted-device, password-less, rapid-threat containment (RTC) and converging our EAP-TLS efforts better together with our Meraki Mobile Device Manager (MDM). With ISE 3.1, we can also address any issues with MAC addresses becoming randomized where this matters for policy conditions. Despite the large scope, staging proved that ISE 3.1 out of the box functionality, was ready to go.

For our deployment operations, we have shifted toward zero touch everything. Zero touch in terms of node maintenance, node replacement, etc. We learned with the ISE 3.1 release, we could utilize the same “lift and shift” methodology we have operationally, such that we can apply it to upgrades as well.

Upgrade process

For our upgrade process, we learned that we had to re-specify our Virtual Machines (VMs) in advance. The ISE 3.1 specification for our VMs required more storage for our PSNs. This means they needed to be imaged directly. This was not a major concern as automation was available through Terraform and other techniques. We started our rollout of PSNs iteratively throughout our data centers. At some point, someone asked me how the upgrade was going. I had no idea. This was a hallmark of how well the upgrade was going.

The upgrade process was seamless. By following the high-level process described here, we reinstated PSNs and their settings through automation. We prevented data loss through backup/restore as well, then applied those backups to the new deployment. The traffic cutover was done in phases; hub by hub, service by service (Wired, Wireless VPN).

In a matter of days, our team performed a “lift and shift” for all our nodes toward ISE 3.1. For this ISE upgrade, this was like a public cloud migration. On a per PSN basis, we were able to automate new compute for target nodes and enable ISE 3.1. Once nodes were online and operating, we were able to then free up previously used resources. We rolled it all on a per-service level — first with Wired/Wireless, then we monitored with relative Service-Level-Indicators (SLIs). We then added Home/Remote access (similar monitoring and situational awareness) and, finally, specific Partner or Business Function access.

As a bonus, we applied rolling patch upgrades throughout. Today, patch upgrades are a seamless process. We started with ISE 3.1 P1, then rolled out P3 to fix remaining issues and, above all, stay current. To increase scope as we gained confidence, we also then applied ISE 3.1 to our TACACS (T+) clusters, and our Guest-serving clusters as well (though not yet mentioned) for global scope.

The biggest lesson we learned was the value of upgrading. Often. This is now something we can repeat and continue to refine. This makes us more agile, and our business more resilient. The upgrade process itself, along with our ability to retain service throughout proved any risks of failure have become diminished.

Business continuity

While our on-premise facilities provide primary latency gains for our end-users for the control-plane, the ISE data-center facilities also back each other up.

From a network element perspective, everything is defined redundantly for PSNs to avoid catastrophe. As we practiced “lift and shift” during the upgrade, we presented identical IP addresses (IPs) to our load-balancers for service to remain uninterrupted; on a per-PSN basis. During this upgrade, not one network element needed to roll-over to a secondary data center.

In the end, we upgraded to ISE 3.1 without a single data-center failover, let alone end-user recognition. Once our parallel infrastructure was in place (and replaced), we ripped the previous compute out to free up resources for other critical IT projects.

Going forward

What changes will the next three to five years bring? As with most things “Future State Network Architecture,” no one can know, so agility is key.

ISE, and staying current on our updated 3.1 rollout, will allow us to pivot more easily towards this future state. We are now moving toward unlocking more features in ISE 3.1, exploring the new API feature set, and looking forward to what further is to come for ISE from an integration perspective.

Inside Cisco IT