Ethernet Transport for NVMe over Fabric: Future of Storage Networking
The last time I wrote about NVMe™ was to introduce it as one of the new capabilities of our MDS 32G portfolio, enabling the seamless interface between flash storage arrays and Fibre Channel fabrics. Today I’m happy to be writing about the next wave of storage networking using Ethernet Transport for NVMe over Fabric (NVMe-oF™) enabled through the Cisco Nexus 9000 family of switches that are powered by the Cisco CloudScale ASIC technology.
While Cisco takes a technology-agnostic stance toward supporting NVMe, there is no question that the flexibility in Ethernet switch intelligence can make a huge difference when designing low latency, high-performance storage traffic over Ethernet. While most of the switches in the market designed with merchant silicon may support the minimum requirements, Cisco Nexus 9000 with Cloudscale ASIC takes this few steps further.
In addition to the DCB Ethernet enhancements (including PFC/lossless traffic), Cisco Nexus 9000 with cloudscale ASIC brings in features for congestion avoidance, effective burst handling and queue management for NVMe over Fabric traffic. For example, Approximate Fair Drop (AFD) provides flow size awareness and fairness to the early drop congestion avoidance mechanism. Dynamic Packet Prioritization (DPP) provides prioritizing small flows over large flows, so that flows will be guaranteed for transmission without suffering packet losses. Intelligent algorithm based intelligent buffering and scheduling approach address the real-world network congestion problems caused by traffic bursts more efficiently.
Using Ethernet to Run NVMe
Extending NVMe over Ethernet and Fibre Channel is of huge interest to enterprises with a large installed base of centralized storage accessible over a network. NVMe has also been standardized to extend its advantages over various fabrics to external and centralized storage, also known as NVMe over Fabrics. The NVMe-oF specification defines common frameworks for NVMe to work over a variety of transports, including Ethernet-, Infiniband-, and Fibre Channel-based networks.
While admins can enable Ethernet-based NVMe-oF on just about any Ethernet switch in theory, the practical limitations of doing so will be readily apparent without using an Ethernet switch that is not designed to handle these type of traffic efficiently.
The fact is, enabling low-latency, high-performance storage traffic over Ethernet is not a new concept. For example, RDMA over Converged Ethernet (RoCE) is a network protocol developed by the InfiniBand Trade Association that enables remote direct memory access (RDMA) over an Ethernet network (standard is already published ). Internet wide-area RDMA Protocol (iWARP) is a TCP-based standard developed by the Internet Engineering Task Force. While RoCEv or iWARP have not seen wide adoption yet for storage traffic; NVMe has already spurred intense interest.
Cisco N9K Advantages
Depending upon the transport, flexibility in Ethernet switch intelligence can make a huge difference. RoCE (both version 1 and 2) works with Ethernet switches that support Data Center Bridging (DCB) that provide lossless characteristics and support priority flow control (PFC). iWARP and NVMe/TCP (being finalized in the NVM Express, Inc. technical working group as this is being written) work best with end-to-end networking intelligence above and beyond the vanilla TCP connection.
Storage traffic depends upon application requirements and is hard to predict the exact nature of the traffic. It contains a mix of small (mice) and very large (elephant) packets and may be busty. Burst handling and queue management need adequate buffers along with intelligent buffer management algorithm to reduce congestion buildup.
Cisco Nexus 9000 with cloudscale ASIC is designed to handle these type of traffic efficiently. DCB Ethernet enhancements (including PFC/lossless traffic) provides zero packet loss for enabling high-performance storage environment for Flash environments.
The intelligent buffer management capabilities are built into Cisco cloud-scale ASICs for hardware-accelerated performance. The main functions include approximate fair dropping (AFD) and dynamic packet prioritization (DPP).
- AFD focuses on preserving buffer space to absorb mice flows, particularly microbursts, which are aggregated mice flows, by limiting the buffer use of aggressive elephant flows. It also aims to enforce bandwidth allocation fairness among elephant flows.
- DPP provides the capability of separating mice flows and elephant flows into two different queues so that buffer space can be allocated to them independently, and different queue scheduling can be applied to them. For example, mice flows can be mapped to a low-latency queue (LLQ), and elephant flows can be sent to a weighted fair queue. AFD and DPP can be deployed separately or jointly. They complement each other to deliver the best application performance.
Usually Switches in the data centers are distributed and hence the traffic from the transmitter will have to pass through multiple switches (multi-hop) before reaching the receiver. This means if any of the intermediate switch experience congestion the performance can be degraded. Explicit Congestion Notification (ECN) feature provides a method for an intermediate switch to notify the end hosts of impending network congestion. The benefit of this feature is the reduction of delay and packet loss in data transmissions.
Explicit Congestion Notification: ECN is an extension to weighted random early detection (WRED) that marks packets instead of dropping them when the average queue length exceeds a specific threshold value. When configured with the WRED ECN feature, routers and end hosts use this marking as a signal that the network is congested to slow down sending packets. With Fast ECN,(available in New Nexus 9000 Series switches) ECN marking is improved where all packets contributed to congestion and queue build up are marked by WRED ECN including those at the beginning of the queue.
The breadth of our NVMe portfolio makes Cisco the right choice for most storage networking customers looking to modernize their infrastructure. Cisco offers a wide range of NVMe-oF solutions, including NVMe over Fibre Channel using MDS 9000 Series as well as a full complement of Ethernet-based NVMe over Fabrics using Nexus 9000 Series switches, and UCS servers (future).
Cisco is also a board member of the NVM Express, Inc. and member of FCIA board, which means that we have been involved with NVMe and FC-NVMe (the official Fibre Channel standard for NVMe-oF implementations) since their inception. Further, Cisco has been steadily introducing new solutions to this technology portfolio over the last several years.
For example, we announced support for NVMe storage on the Cisco UCS C-Series Rack Servers and B-Series Blade Servers in October 2016. Multicore CPUs on the Cisco UCS servers and Hyperflex (industry’s first All-NVMe hyperconverged appliances) can utilize the full potential of NVMe capable solid-state drives connected via PCIe
In April 2017, the support of NVMe over Fibre Channel was extended to Cisco UCS C-Series Rack Servers, MDS 9000 series switches, and Cisco Nexus® 5000 Series Switches. In November 2017, the support was extended to Nexus 93180YC-FX on Fibre Channel ports. Cisco Nexus 9000 switches support Ethernet-based NVMe over Fabrics and are the best switches in the market to scale any IP/Ethernet-based Storage environments. Cisco has participated in UNH IOL interop, and pass all the test and are on integrators list for NVMe oF https://www.iol.unh.edu/registry/nvmeof
For more info