Avatar

Model-Driven Telemetry

Telemetry is used to subscribe to meaningful data and measurements from remote devices and transport it to a receiving system for monitoring and analysis. This data allows network administrators to proactively visualize their network, and helps them automate tasks and forecast capacity requirements. When we talk about streaming telemetry, we typically refer to getting operational data from network protocols and packet statistics in real-time. In addition, we may also consider the monitoring of system events as part of network telemetry.

“Time may change me, but I can’t trace time”, as David Bowie famously noted. If we were to look back at the trajectory of telemetry over time, there have been a lot of changes, but it isn’t straightforward to map out a clear evolution. We’ve had various flavors of monitoring and automation on Cisco switches that are sometimes used in conjunction with the protocols we have today. A few years ago, while operating the NOC at Cisco Live, we visualized data from our switches using SNMP, syslogs and netflow. A fun exercise in automation was having our Catalyst 6500 switch send network statistics directly to Twitter using EEM and Tcl scripts! These protocols allowed us to get granular data and break down client traffic during the event. Today’s data centers need an efficient way to gather large amounts of data from highly scaled networks, and to be able to rapidly automate changes with open APIs. This is offered through model-driven telemetry and automation using protocols such as NETCONF, RESTCONF and gNMI. The use of YANG-based data models with these protocols provides consistency, flexibility and a programmatic framework on our switches. In addition, the same protocol can be leveraged both for telemetry and network configuration, simplifying operations and making it more versatile at the same time.

OpenConfig on Nexus Switches

Nexus 9000 Series switches offer different options as data sources for model-based telemetry, including Data Management Engine (DME), device YANG and OpenConfig YANG. While DME and device YANG are specific to Nexus switches, the OpenConfig model is a vendor-neutral specification that allows the same model to be used across multiple platforms and operating systems. This makes it a powerful choice for network programmability. Moreover, the same OpenConfig models can be used whether a customer decides to utilize a standards-based protocol such as NETCONF or an open specification such as gNMI.

In this article, we highlight two new functionalities which are now available with Cisco NX-OS 9.3(5).

     1. NETCONF Event Notifications with OpenConfig

NETCONF Event Notifications are now supported with OpenConfig, and can be used to subscribe to system events. Customers who are migrating from legacy SNMP-based monitoring tools can now  achieve functionality similar to SNMP traps with NETCONF notifications which uses YANG-based models.

     2. gNMI support with OpenConfig

This release also introduced gNMI support with OpenConfig which enables dial-in telemetry and network automation.

We will go over an example using NETCONF and gNMI to subscribe to the total number of BGP prefixes on a system. While the former method subscribes to event notifications, the latter subscribes to dial-in telemetry on the switch. Both methods leverage the same OpenConfig data model.

NETCONF Event Notifications with OpenConfig

In terms of NETCONF support, Nexus 9000 switches support various operations with device YANG and OpenConfig models. NETCONF Event Notifications is a new mechanism available on Nexus 9000 Series switches by which a NETCONF client can subscribe to system events from a NETCONF server (switch) and continue to receive them asynchronously as long as those events are generated, without having to do periodic polling. This is similar to functionality offered by Syslog or SNMP traps, but the events are generated upon changes in the YANG model tree (device YANG or OpenConfig model tree). NETCONF notifications were first supported on NX-OS 9.3(1) with device YANG models. The latest NX-OS release 9.3(5) adds support for OpenConfig models with NETCONF notifications.

Notifications are formatted in an XML payload similar to the NETCONF <get> response. The events are generated for both configurational and operational changes in the switch. Users can monitor the NETCONF server for changes under a particular portion of a tree using a filter.

Switch Configuration

To support NETCONF event notifications with OpenConfig, the Nexus 9000 Series switch needs to be running a minimum version of 9.3(5). In addition to this, the corresponding OpenConfig RPM package needs to be installed on the switch. The only configuration required is “feature netconf”. Optionally, the idle timeout for a session and the maximum number of simultaneous client sessions can also be configured.

Subscriptions

Multiple NETCONF clients can subscribe to event notifications using “create-subscription” protocol operation. Once the subscription is successful, notifications will be delivered to clients as long as the session is active. The events that trigger notifications can be configuration changes (for example, changing the MTU of an interface) or operational changes (for example, an interface going up or down or a change in the number of routing prefixes). A successful subscription is replied to with a response <ok>, and a failed subscription returns a response <operation-failed> including the reason for failure.

NETCONF Event Notifications

Only one subscription is allowed per session. If the subscription is at container level or list level, the client receives notifications for any changes to the child elements as well. For subscriptions at the leaf level, the client receives notifications for any changes to that leaf element. Here is an example of a “create-subscription” request through a Python script that subscribes to notifications on total BGP prefixes in the openconfig-network-instance model. The xpath used is “/network-instances/network-instance/protocols/protocol/bgp/global/state/total-prefixes”.

divya@Ubuntu-host:~/python/netconf$ python3 netconf-notifications-bgp.py 
Session id is :  1771883408

create-subscription request is :  
<create-subscription xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0">
    <stream>NETCONF</stream>
    <filter xmlns:ns1="urn:ietf:params:xml:ns:netconf:base:1.0" type="subtree">
         <network-instances xmlns="http://openconfig.net/yang/network-instance">
            <network-instance>
                <protocols>
                    <protocol>
                        <bgp>
                            <global>
                                <state>
                                   <total-prefixes> </total-prefixes>
                                </state>
                            </global>
                        </bgp>
                    </protocol>
                </protocols>
            </network-instance>
        </network-instances>
    </filter>
</create-subscription>


Response for <create-subscription> : 
<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="urn:uuid:fb58104c-7d3c-40af-bfab-eb6d1fc1f223">
    <ok/>
</rpc-reply>

Receiving a notification

A notification consists of the eventTime, operation and event as seen below for the example of total prefixes in BGP (in this case, the switch returns 154 prefixes).

<?xml version="1.0" encoding="UTF-8"?>
<notification xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0">
    <eventTime>2020-09-21T02:27:01.973+00:00</eventTime>
    <operation>modified</operation>
    <event>
        <network-instances xmlns="http://openconfig.net/yang/network-instance">
            <network-instance>
                <name>default</name>
                <protocols>
                    <protocol>
                        <identifier xmlns:oc-pol-types="http://openconfig.net/yang/policy-types">oc-pol-types:BGP</identifier>
                        <name>bgp</name>
                        <bgp>
                            <global>
                                <state>
                                    <total-prefixes>154</total-prefixes>
                                </state>
                            </global>
                        </bgp>
                    </protocol>
                </protocols>
            </network-instance>
        </network-instances>
    </event>
</notification>

Capabilities Exchange

NETCONF notification support is advertised in the capability exchange during the NETCONF handshake. The server informs the client that it supports both NETCONF notifications and interleave (the ability to interleave other NETCONF operations within a notification subscription).

Event Stream Discovery

A NETCONF client can discover the available event streams using the <get> operation as shown in this example. At this time, NX-OS only supports NETCONF streams, although RFC 5277 allows subscription of SYSLOG, SNMP and NETCONF streams.

NETCONF Client

In my examples above, I’ve used ncclient as the NETCONF client tool to write scripts in Python to subscribe to NETCONF event notifications. The scripts can be found on GitHub here, with the BGP example provided in the script netconf-notifications-bgp.py.

gNMI Subscription with OpenConfig

In an earlier blog post, I described how gNMI support with OpenConfig is a powerful open specification used to subscribe to telemetry data on switches. It works with a dial-in telemetry concept, where the client subscribes to telemetry data on the switch and the switch sends the data either periodically or upon a change. With the use of Telegraf (and the TICK stack) it is possible to have a collector, a time-series database and a GUI to visualize telemetry data.

The “SubscribeRequest” is used for applications such as telemetry or for larger queries of data. In the case of telemetry with gNMI, the subscriber (client) specifies a frequency of delivery which could be:

ONCE: The data is returned immediately and only once for all specified paths.

POLL: The data is returned from the device when polled with the current values for all specified paths.

STREAM: The data is returned continuously. This could either be in the SAMPLE mode (where data is returned periodically in accordance with a sample interval) or in the ON_CHANGE mode where the data is returned if there is a change in values.

The details of all the requests and responses can be found in the gNMI specification. The switch configuration and setup for gNMI telemetry is detailed here.

In some instances, there can be two gNMI sessions established between client and server, one for streaming telemetry via gNMI Subscribe and the other for network configuration using gNMI Get and Set methods. The “SubscribeRequest” specifies what data is being requested, and the “SubscribeResponse” includes a notification message providing an update value for a subscribed data entity. A “sync_response”  indicates that all the data values corresponding to the specified paths have been transmitted at least once.

Custom-Built Telemetry Collector

The cisco-gnmi Python library can be used to subscribe to telemetry data from a switch. The library-based CLI wrapper can be used to do the gNMI “SubscribeRequest” operation in various modes to stream telemetry data. In addition, the library can be used in a a Python script for telemetry subscription. In the example script gnmi-subscribe-bgp.py, a client is subscribing to telemetry data from the switch and displays it on a simple collector we setup. The data being subscribed to in this case is the total BGP prefixes that the switch has received from its BGP neighbors, specified by the following xpath:

xpath = "/network-instances/network-instance/protocols/protocol/bgp/global/state/total-prefixes"

The xpath above is used to access the data using the OpenConfig model openconfig-network-instance. The example script is set up to stream this value every 10 seconds in the SAMPLE mode, but can also be set up with the ON_CHANGE mode. Once we receive the data and format it in JSON, we feed this telemetry data to a custom-built collector. As seen in the example script, with just a few lines of code, we are able to build our own database and graph the telemetry data onto it. The custom-built collector shown below uses the rrdtool Python module to create a time-series round robin database. We use the values stored in the database to generate a graph in PNG format that plots the telemetry data. The graph is rendered in a html page using lighttpd. We stream telemetry data from the Nexus 9000 Series switch and see that total the number of BGP prefixes goes from 175 to 200 as we inject more routes.

Custom-built telemetry collector

For more information on gNMI and OpenConfig on Nexus 9000 Series switches, see my whitepaper here.

Conclusions

Nexus 9000 Series switches have a strong foundation in model-based telemetry, and we’ve shared a few highlights in this article. As we take network programmability even further using network insights and intent-based networking, we begin to automate the ineffable!

References

  1. Data Center Telemetry and Network Automation Using gNMI and OpenConfig White Paper
  2. Hot off the press: Introducing OpenConfig Telemetry on NX-OS with gNMI and Telegraf!

  3. Network Automation and the Ingenuity of Data Models

  4. Network Programmability with YANG: The Structure of Network Automation with YANG, NETCONF, RESTCONF, and gNMI [Addison-Wesley ISBN-13: 978-0135180396]

 



Authors

Divya Rao

Technical Marketing Engineer

Data Center Networking