Cisco Crosswork – Part 4: Health Insights
In February, Cisco announced its latest innovation – Cisco Crosswork Network Automation – a new network automation portfolio for Service Providers. Read Jonathan Davidson’s blog for an overview to understand our comprehensive approach in enabling a closed-loop, mass-scale automation solution. Follow this multi-part blog series to learn more about each new solution in the Cisco Crosswork portfolio.
In this blog series, we have been providing more detail on the five new pillars of the Cisco Crosswork automation solution. So far, you have learned about Cisco Crosswork Change Automation, Cisco Crosswork Network Insights and Cisco Crosswork Situation Manager. Today, let’s take a closer look at Cisco Crosswork Health Insights.
Intent-based networking promises to improve network availability and agility, which are key considerations as operators digitize their operations. Cisco Crosswork Health Insights can improve network availability by providing tools for device/network health monitoring to reduce MTTK/MTTR of network issues. Cisco Crosswork Health Insights (CCHI) simplifies and abstracts the collection and analysis of time series data and helps rectify network faults via key value drivers described below.
- Zero-touch telemetry streamlines the operational and network management overhead of collecting and cleansing data, thereby allowing operators to focus on their business goals. As part of zero-touch telemetry, devices are automatically provisioned with telemetry configuration and tables/schema are created in a Time Series Database (TSDB).
- By using a common collector to collect network device data over SNMP, CLI, and model-driven telemetry, and making it available as modelled data described in YANG, duplicate data collection is avoided, optimizing the load on both the devices and the network.
- Recommendation Engine analyzes network device hardware and software, configuration, and employs a pre-trained model built from data mining, producing KPI relevant recommendations facilitating per use-case monitoring.
- KPIs cover a wide range of statistics from CPU, memory, disk, layer 1/2/3 network counters, to per protocol, LPTS and ASIC statistics.
- Health Insights builds dynamic detection and analytics modules that allow operators to monitor and see alerts on network events based on user-defined logic (KPI).
- Key Performance Indicators (KPIs) Alerting Logic can be :
- Simple static thresholds (TCA), E.g. CPU load going above 90 percent.
- Moving average, standard-deviation, and percentile based, etc., E.g. CPU load above mean and staying there for five minutes.
- Streaming jobs which provide real-time alerts or batch jobs which run periodically.
- Customized for threshold values and visualization dashboards.
- Customized Operator created KPIs based on business logic (easily scripted with a domain specific language).
- TCAs can be exported/integrated with other systems via HTTP, Slack and socket interfaces.
- KPIs can be associated with dashboards, which provide real-time and historical views of the raw data and corresponding TCAs.
- KPIs also provide purpose-built dashboards that go beyond raw data and provide valuable information in various infographic style charts and graphs useful for triaging and root-causing complex issues
- Health Insights KPIs can be associated with Cisco Crosswork Change Automation (CCCA) playbooks (or webhooks), which can be either executed manually or via auto-remediation. Remediation workflow could be used to fix the issue or collect more data from the network devices. By proactively remediating the situation, instead of resorting to ad hoc debugging and unscheduled downtime, operators can save time and money, providing better QOE to their customers.
- Health Insights does the correlation of alerts/anomalies on the topology of the network, allowing easy visualization of the impact of events.
- Health Insights can integrate with Cisco Crosswork Situation Manager to correlate events across networks, services, and applications.
Let’s look at an example of Smart Alerts and Remediation using out-of-the-box KPI called black hole detection. There could be other KPIs like CEF drops, interface error drops that operators might be monitoring which provides alerts on known issues. Black hole detection helps isolate those problems which are not yet known or could be silent drops in the ASIC(s) which aren’t yet monitored. The transmitted packets/bytes and received transmitted bytes/packets across all interfaces on a device are used to calculate loss ratio. Interfaces statistics aren’t atomically gathered from the devices hence, by applying a low-pass filter like standard-deviation on the loss ratio, we detect if it’s a real anomaly or not. Optionally, we can schedule a CCCA node cost-out playbook when black hole event is detected, this helps to reduce the service impact of a black hole event.
We are very excited about the transformations we are seeing inside of our Service Provider customers and how the Cisco Crosswork network automation solution can help them accelerate their journey to a fully self-healing infrastructure. Please leave comments or questions below so we can continue the conversation – and stay tuned for the final blog in this series.