In the space of a few weeks in March and April 2020, Cisco’s entire workforce began working from home. For most, that meant relying even more on remote collaboration technologies to get things done and stay in touch with co-workers, customers, and families.  The role of our VPN (Virtual Private Network) changed significantly, from providing backup access during regional events (snowstorms, hurricanes, earthquakes, building fires, etc.) to providing primary access for tens of thousands of employees, contractors, and partners around the world.

We in Cisco IT meet regularly with customers, sharing how we’re addressing various business challenges.  We have been receiving many inquiries about how we scaled our VPN infrastructure. Here are some answers.

Who has VPN access at Cisco?

All employees, contractors, and partners who use in-building Cisco IT services have VPN access. Our users install Cisco AnyConnect Secure Mobility Client on their laptops and mobile devices. Full-time teleworkers have a Cisco Virtual Office setup, which includes a hardware-based VPN service.

How has the role of VPN changed?

Before the outbreak, our VPN saw around 100,000 unique monthly users, with an average of 500,000 connections per week. Most VPNs are designed for the occasional days when users want to work from home or need to finish up a project at night. Cisco has a strong remote-work culture—and our VPN infrastructure has grown to support it. In major sites, if power goes down in one building, the services in other buildings can support all users. And if an entire site goes down, we have failover sites—for example, Richardson, Texas is a backup site for San Jose, California and RTP, North Carolina. But for global events, when all employees use the VPN every day, sites that ordinarily could provide failover were already running at record capacity. We knew that to keep the business running smoothly during the pandemic and make sure we had redundancy, we needed to scale up our VPN infrastructure.

How do you secure VPN access?

In addition to using the Cisco AnyConnect client, we use Cisco Umbrella to provide DNS-layer security in the cloud, maintain policy and access control lists (ACLs) on our firewalls, require multi-factor authentication with Cisco Duo Security, and enforce access policies using Cisco Identity Services Engine. We push mobile app updates, which have the latest security fixes, with Cisco Meraki Systems Manager (story here).

How do you know which sites need more VPN capacity?

For each site, we look at the maximum number of concurrent sessions daily. We also compare the number of distinct devices connecting over the previous month with the maximum number of IP addresses available in the region. We find this information using third-party and internal network monitoring tools. In this way we determined that our Richardson site had 30,000 IP addresses in reserve, while RTP and Sydney were each about 500 IP addresses short.

How are you scaling up VPN?

We’re adding IP addresses, VPN hubs, and firewalls where needed. We monitor IP address usage every day using an automated script. We’re currently adding addresses to the locations that are nearing device capacity.

We’ve expedited plans to build new VPN hubs in London and Ottawa. Just as the Richardson hub is the failover site for San Jose and RTP, London will be the failover site for our European locations. If VPN hub capacity in Amsterdam is exhausted, users there will have a better experience connecting to the London hub than the Richardson hub—5000 miles away. Our Canadian users will connect to the Ottawa hub.

What about service provider circuit capacity?

We have peering connections with service providers in different parts of the world. If you’re a Cisco employee with a Comcast circuit, you’ll have the best experience if you connect over the Comcast circuit all the way to the VPN hub. But if that circuit is maxed out, our VPN concentrator re-directs traffic across another provider network, such as AT&T or Spectrum. That slows application response, so we’re evaluating where additional service provider circuits would improve the user experience.

We’re also looking at where we need to add capacity to existing service provider circuits. Until now we’ve contracted for a committed information rate (CIR) a bit higher than typical usage, with an option to burst when needed. In larger sites, for example, we have a 10-Gbps pipe, pay for 2-Gbps CIR—and can burst to the full 10 Gbps. Although bursting is costly, in some cases it costs less per month than paying for a larger CIR. To see what’s most economical, we consider the number of circuits that are bursting and for how long.

Tip: Check with your service provider to be sure your pipes are burstable. We discovered that one of our Chinese service providers limits our rate. Knowing this changed our plans for the other circuits.

What do you do if the Internet is slow?

This is one we can’t control. Users in China complained about the experience for the first several days they worked from home. There was nothing we could do other than wait for the service provider to speed up the network, which they eventually did.

Are you using split tunneling to keep Internet-bound traffic off the network?

Background: With split tunneling, you configure the VPN client to direct traffic destined for the company network (data center-based applications, etc.) over the VPN while directing Internet traffic directly to the Internet. Bypassing the VPN for certain Internet-bound traffic improves the user experience and reduces load on firewalls and the LAN.

Cisco IT and our Trust and Security team decided that split tunneling all Internet traffic would expose us to too much risk. A quick visit to Facebook or Twitter to break up the work day can infect a laptop with malware that spreads across the company. To minimize risk, we use split tunneling only for about a dozen cloud services that pass stringent security criteria, including good data hygiene and compatibility with Duo Security. These include Cisco TV, Office 365, Apple and Microsoft Updates, and Box. These dozen cloud services produce about one-third of our Internet traffic.

Does split tunneling improve the application experience?

Yes, if the application produces a lot of traffic. Some of the applications we’ve selected generate traffic constantly, like Office 365. Others produce traffic spikes, like Cisco TV and iOS and Windows updates on Monday mornings. In early March 2020, when our CEO held his first Covid-19 Q&A, Cisco TV generated 57% of the traffic on our ISDN links.

With needs changing so fast, how are you managing the VPN?

We formed a 20-member Tiger Team to scale the VPN and manage change. With IT staff members from North America and Asia, the team can work around the clock. All team members meet twice a day to review VPN status reports comparing the peak number of concurrent IP addresses at each site against capacity. We check traffic loads hourly. Once site utilization reaches 60%, we start planning next steps.

What tips do you have for change management?

Before offices shut down, we asked business leaders to send their teams an email we’d written urging them to reset passwords, update business apps, and try logging on to the VPN before the last day in the office. (Users can’t log in unless they have the latest security patches.) For employees with a Cisco Virtual Office setup, we sent out a reminder to keep it powered on all the time instead of powering down at night so that it receives security patches.

Cisco business leaders and regional IT teams also shared tips for working from home with their teams, including:

  • Check PC health using our internal tool.
  • Use the latest version of AnyConnect.
  • Make sure Cisco Duo Security (our multi-factor authentication application) is working correctly by visiting the site we have set up for that purpose.
  • Remember that a VPN connection isn’t needed for software-as-a-service (SaaS), like Webex Teams, Webex Meetings, or Office 365. If the VPN experience is slow, turn VPN off to work in these applications.
  • Schedule application upgrades and backups outside of normal work hours.

Do you expect VPN traffic to continue growing?

As of mid-March 2020, all 60,000 employees authorized to use the Cisco network connected via VPN. Another 15,000 use Cisco Virtual Office. All employees and partners in China are already working from home, so we’re not expecting to add any more VPN capacity. In San Jose, we saw a huge spike in VPN traffic on March 9 when the executive team directed everyone to work from home, and we expect volume to continue to rise, likely to 15,000 concurrent devices. Similarly, RTP will likely peak at around 16,000 devices.

Any other questions about how we’re scaling to support teleworking? Ask them here.