In collaboration with Venkat Bongoni, and Chuck Churchill

Many IT organizations are evolving to a hybrid cloud paradigm by using on-premise, private, public and/or third-party cloud services to run their IT services and applications. And according to ResearchandMarkets.com this trend will continue with the cloud market predicted to reach $262B USD by 2027.* For Cisco IT, the evolution towards hybrid cloud is manifest in the following ways:

  • Cisco IT’s vast infrastructure and application landscape spans across multiple global private data centers
  • Application teams also make use of multiple Cloud Service Providers (CSPs), leading to the development of a common, curated provisioning and management system to provide a level of Day 1/Day 2 uniformity across those CSPs
  • Cisco makes heavy use of SaaS platforms
  • These applications and their related infrastructure and network components are monitored by more than 50+ tools across several operational domains.

Layers of complexity add challenges

The increase in applications, data, and tools deployed or migrated across the hybrid cloud landscape creates layers of complexity that ultimately leads to a state of monitoring, visibility, and observability obscurity for large enterprises. While multiple monitoring, visibility, and observability tools are necessary, they add complexity and challenges to IT Operations. But using AIOps as part of a broad Cisco Full-Stack Observability (FSO) solution stack, can significantly decrease complexity and give you higher service and application availability.

Cisco IT uses multiple tools and platforms from Cisco, third parties, and open source to provide monitoring, visibility, and observability into applications, networks, infrastructure, and SaaS platforms across the hybrid cloud landscape.  These tools and platforms provide Metrics, Events, Logs, and Tracing (MELT) telemetry:

  • Monitoring is focused on domain-specific availability and capacity;
  • Visibility and observability builds upon monitoring by adding in the focus on performance and experience using a subset of telemetry-based MELT (within domain boundaries);
  • Cisco Full- Stack Observability builds upon visibility and observability by providing cross-functional domain correlation, enriched with business context, full MELT telemetry + security, with insight-driven actions.

Using multiple domain-specific monitoring, visibility, and observability tools provides insight into each domain yet creates the following challenges:

  • Increased time to resolve an issue at a business level and restore service during an incident
  • Increased operational management complexity and an ever-increasing total cost of ownership
  • Multiple disjointed sources of truth
  • Lack of a single pane of glass, with little to no correlation to business KPIs
  • The inability to use machine learning and artificial intelligence across operational domains to perform predictive analytics

How Cisco IT is reducing complexity with correlated data from multiple domains

While Cisco IT strives to reduce complexity by reducing the number of monitoring, visibility, and observability tools, it will always be faced with an ever-evolving set of domain solutions.  To respond to this challenge they built an internal Cisco IT platform that brought the MELT telemetry from these tools and platforms together to be consumed by several Cisco IT personas:  IaaS/PaaS Domain teams, the IT Enterprise Operations Center (EOC) team, Application DevOPs teams, and IT Service Managers and executives. This over-the-top solution ingests the monitoring, visibility, and observability telemetry, and enriches it with relationship and topology data from Cisco IT’s ServiceNow-based Enterprise Service Platform to provide insights on the correlated data.

The results

By evolving from monitoring to visibility towards Cisco FSO, and integrating AIOps technology and techniques, Cisco IT has achieved the following over a period of 3+ years:

  • Over 99% noise reduction leading to improved event prioritization
  • Seventy percent (70%) reduction in critical business impacting hours
  • Fifty-nine percent (59%) reduction in Major and Critical incident count
  • Improved service and application availability from 99.799% to 99.996%

Delivering insights on IT personas using machine learning

The transformation layer of Cisco IT’s AIOps platform enriches events with dependent information that is presented in the visualization layers for use by the various Cisco IT personas.  This allows us to build persona-based visualization frameworks designed to present correlated and actionable alerts after reducing noise using machine learning.  For example, we associate issues back to a business service so the Enterprise Operations Center (EOC) team can engage the right business, IT, and domain teams through collaboration channels to act. Using the command center console (Figure 1), the EOC team monitors critical business services and related infrastructure components. With contextual information and dependencies, triaging an incident is made much simpler. This data is also used to report business availability to service executives in real-time (Figure 2).



Watch our on-demand webinar to learn about more Cisco Full-Stack Observability innovations:

Extend your observability with the Cisco Full-Stack Observability Platform




Anusha Nataraj

Technical Program Manager

Cisco IT