The ransomware threat has never been greater than it is today. Financial institutions process more digital transactions for more customers today than at any point in human history. The wealth that can be exploited through disruption in any large financial market is significant.
Ransomware and malware have been areas of key concern by regulators in the past 24 months and updates to the Federal Financial Institutions Examination Council (FFIEC) and PCI DSS 4.0 now both include specific guidance on ransomware.
2024 is on track to be another record breaking year in the exponential growth curve of security vulnerabilities. The number of public CVEs this year is estimated to be more than double what it was 7 years ago, which was double what it was 7 years before that.
Against this increasing amount of risk, financial institutions are being held to a higher standard in addressing security vulnerabilities. On top of this, there is a greater need to upgrade software and patch requirements to address public vulnerabilities. Financial institutions are stuck between an unstoppable force and an immovable object.
Thankfully, in the past few years the in-service software features in the NX-OS product family got a major uplift. While the ability to do stateful switchover and ISSU of dual supervisor systems has long been a capability, patching the single supervisor top of rack switches in the Nexus product line had considerations that relied on network design to actualize ISSU. Specifically, tuning a network to converge around nodes quickly can result in false positives during ISSU, which needs the control plane to restart. Thus fast convergence and ISSU used to be mutually exclusive for single supervisor systems.
The newest features use advances in technology to create a containerized “redundant supervisor” where the failover of control plane can happen in less than a second.
Recently, I had the opportunity to scale test the latest features. Specifically, a lab for a fortune 50 customer that wanted to explore scale parameters previously unheard of, including a Vxlan fabric with 1300 Vteps (1100 active in forwarding plane), 90K mac, 90k IPv4, > 200 VRF, > 2000 vlans, > 128k IPv4 LPM routes, all active in the data plane of the device, in a network with optimized routing timers with live overlay L3 traffic in a full mesh between 50 hosts across a multisite environment. The purpose of the lab was to explore extreme values to determine how devices operate, and what features work at that level. Following our testing, I can confirm, eISSU works great with this sizing with active traffic.
With the intent of the lab being to explore scale and test features, we did an ISSU on this platform in the scale environment. As advertised, the upgrade worked flawlessly, every time (we did it multiple times), across MAJOR releases (10.4 -> 10.5). The only impact observed was to our SSH session, which does not fail over by design (what one person calls SSH failover another calls session hijacking, it’s the same thing, and thankfully, it does not failover).
There were zero drops in either the Spirent full mesh flows, or the ICMP packets. It took about 8 minutes total (creating second sup, synchronization, prep work, and sanity), with the failover happening very fast.
Under scale and load testing, the enhanced ISSU feature worked as designed, with sub second control plane and management plane switchover, and no packet or control plane drops across a major software upgrade.
I am pleased to say that these new features are exactly what is needed to support financial institutions today.
To learn more and how this can be applied in your environment, please reach out out to your account team.