Cisco recently announced software availability of NX-OS 9.2(2) with support for SAN Analytics on the Cisco MDS 9700 Series switches with 64G Modules. This software release begins the next phase in the architecture evolution of SAN Analytics.
In this blog we will do a high-level comparison of SAN Analytics Architecture between the Cisco MDS 32G and 64G platforms and look at some of the new innovations of Cisco MDS 64G SAN Analytics.
But first, let’s cover methodologies used for performance monitoring. Utilization, Saturation and Errors (USE) is a generic methodology for effective performance monitoring of any system. The USE metrics identify performance bottlenecks of a system. In the context of a storage system, we can add Latency as an additional element into the USE methodology to create LUSE. A full visibility into LUSE metrics of a storage infrastructure is critical for performance monitoring and troubleshooting.
SAN Analytics and SAN Insights are advance features of the Cisco MDS 32G switches since NX-OS 8.3(2):
- SAN Analytics is an advance feature of Cisco MDS switches that collects storage I/O metrics from switches independent of host and storage systems. Over 70 metrics are collected per-port, per-flow (ITL/ITN) and streamed out. These metrics can be classified into one of the ‘LUSE’ categories.
- SAN Insights is a capability of Cisco Nexus Dashboard Fabric Controller (Formerly DCNM) SAN that receives the metrics stream from SAN Analytics. It provides the visualization and analysis of fabric wide I/O metrics using the ‘LUSE’ framework.
Cisco MDS 32G SAN Analytics
Access Control Lists (ACL) enforce access control on every frame switched by the ASIC. The ACLs are matched extracting certain fields from the frame header and on a match the action corresponding to the entry is taken. On an F-port, FC Hard Zoning entries are programmed as ACLs in the ingress direction based on Zoning configuration to match on the frame SID and DID with an action to “forward” the frame to the destination.
On Cisco MDS 32G switches, the I/O metrics are computed by capturing FC frame headers in the data path using an ACL based ‘Tap’ programmed in the ASIC on ingress and egress direction of the analytics enabled ports. These Tap ACLs match on frames of interest for Analytics viz. CMD_IU, 1st DATA_IU, XRDY_IU, RSP_IU and ABTS. A copy of the frame matching the Tap ACL is forwarded to an on-board NPU connected to the 32G ASIC.
When SAN analytics is enabled on a port, the ACLs are programmed depending on the port type and direction as shown in Figure 1 below:
- F_Port Ingress: Analytics Tap ACLs + Zoning ACLs
- F_Port Egress, E_Port Ingress, E_Port Egress: Analytics Tap ACLs only
The Cisco MDS 32G NPU runs a software Analytics Engine which examines the frame headers it receives and maintains state for each I/O and matches I/O responses to requests. The Engine computes several metrics for completed I/Os and maintains that information in a database on a per-flow basis. This NPU database is persistent and accumulates all metrics over time. The NPU periodically ships the metrics database to the Switch Supervisor where the flow metrics are aggregated from across all the Modules and streamed out.
The Cisco MDS 32G NPU software Analytics Engine can be modified to accommodate custom metrics (Eg: NVMe Flush command metrics) or futuristic storage command sets (Eg: NVMe-KV) with the required ACL Taps in place.
Cisco MDS 64G SAN Analytics
The Analytics Engine moves into the ASIC on Cisco MDS 64G switches, giving it a hardware acceleration. The Cisco MDS 64G Module has two 64G ASICs and each ASIC has six hardware Analytics Engines (one for every four ports). These Analytic Engines can compute I/O metrics at line rate on all ports simultaneously with capacity to analyze upwards of 1 billion IOPS per Module. The hardware Analytics Engines have built-in Taps and do not need the ACL based Taps to be programmed.
The metrics computed by hardware Analytics Engines are stored in a database inside the ASIC and periodically flushed to the NPU. The NPU runs a lightweight software process on top of DPDK (an open source highly efficient and fast packet processing framework) that collects and accumulates the metrics pushed periodically from the hardware Analytics Engine. Even though the NPU does not run an Analytics Engine, it maintains the persistent metrics database per-flow and remains the critical element of the solution. The shipping of metrics from the NPU database to the Supervisor is identical to the Cisco MDS 32G Architecture. The Cisco MDS 64G hardware Analytics Engine does not preclude a NPU software Analytics Engine to be enabled in a future software release for flexibility and programmability benefits.
A comparison of the Cisco MDS 32G and MDS 64G architectures are shown in Figure 2 below:
The Cisco MDS 64G hardware Analytics Engine computes some additional metrics for deeper I/O visibility:
- Multi-sequence write I/Os are large writes involving multiple XRDY sequences. The write exchange completion time for these writes include delays introduced by the Host (Rx XRDYn to Tx first DATAn+1) and the Storage (Rx Last DATAn-1 to Tx XRDYn). These metrics provide better analysis and accurate pinpointing of large write performance issues. The Analytics Engine separately tracks:
- Avg/Min/Max host write delay
- Avg/Min/Max storage write delay
- The total busy time metric tracks the total time there was at least one outstanding I/O per-flow. This metric helps to characterize the ‘busyness’ of a flow relative to other flows.
The hardware Analytics Engine by default tracks SCSI and NVMe I/O metrics at ITL/ITN granularity. However, it can also be programmed to track metrics for various flow granularity of IT, ITL-VMID, ITN-NVMeConnectionID or ITN-NVMeConnectionID-VMID. This gives flexibility in choosing the granularity of metrics and I/O visibility.
The 1GbE analytics port on the Cisco MDS 64G Module can stream the per-flow metrics directly (without involvement of Supervisor) in an ASIC native or standard gPB/gRPC format. This can serve future use-cases that require visibility into micro telemetry events, which would require high frequency telemetry streaming.
The Cisco MDS SAN Analytics and SAN Insights is a key solution to monitor and troubleshoot performance problems in the MDS FC SAN using a ‘LUSE’ or any equivalent methodology. The Cisco MDS 64G platforms (operating at any speed) now comes with a hardware Analytics Engine that can compute I/O metrics at line rate on all ports. The Cisco MDS architecture is the industry’s most flexible, programmable, scalable, and future proof SAN solutions with no forklift upgrade of chassis or rip and replace to adopt the latest SAN innovations.
To learn more, visit Cisco SAN Analytics and SAN Telemetry Streaming Solution Overview
Unlock the full value of your Storage Area Networking solution
Cisco MDS 9000 is built to meet today’s demands while accommodating future innovation. The Cisco MDS architecture is the industry’s most flexible, programmable, scalable, and future proof SAN solutions that support Multi-Generation and Multi-Speed interoperability of existing 16G, 32G, and new 64G line cards in existing chassis for graceful migration and adoption of the latest SAN innovations.
Cisco SAN Analytics Diaries Part-1: Performance metrics
Unlock SAN Innovation with Cisco Nexus Dashboard Fabric Controller Solution Overview
Cisco SAN Insights Discovery (SID) Tool
Prevent SAN Congestion with Cisco MDS DIRL
Optimize, Accelerate, and Simplify SANs Non-disruptively
CONNECT WITH CISCO