One feature of the usNIC ultra-low latency Ethernet solution for the UCS Cisco VIC that we think is interesting is the fact that it is based on SR-IOV.
What is SR-IOV, and why is it relevant in the HPC world?
SR-IOV (Single Root I/O Virtualization) is commonly used in the server virtualization world. The most commonly described purpose of SR-IOV in the hypervisor world is to allow a device partition, called VF (Virtual Function), to be mapped in the guest operating system address space. This allows the guest operating system to enjoy higher I/O performance and lower CPU utilization as compared to the alternative: software-emulated devices that are traditionally implemented in hypervisors.
Compared to the old world before hypervisors came along, that use of SR-IOV seems to allow to regain back some performance lost due to the hypervisor software intervention in the I/O data path. But why should I care about SR-IOV in the world of my network-latency-bound HPC applications running on common operating systems on bare metal servers?
SR-IOV is part of the PCIe standard and it is much more generic than the commonly described use in the hypervisor world suggests:
- An SR-IOV compliant device doesn’t require a hypervisor in order to be used. For instance, support for SR-IOV devices and drivers has long been available in the stock Linux kernel.
- A VF device doesn’t require the virtual machine monitor software interposition in order to participate in I/O. Instead, the VF is a first class agent on the PCIe bus and, as such, it can originate or claim PCIe transactions autonomously.
- A VF device doesn’t require a VM in order to be used. Instead, a VF device can be presented to any address space in the system. For example, the VF device can be presented to a usual unix-style process running in user space.
Ok, but what do these generic provisions of SR-IOV mean for my HPC applications?
They mean that (in reverse order):
- An application program can use a user space VF device driver instance to manipulate I/O hardware resources, such as transmit and receive packet queues. Multiple application processes can use each a different VF by using the VF’s own device driver instance and its own I/O hardware resources. The potential benefits, in terms of reduced IPC latency over the network and increased I/O parallelism among processes, are significant.
- An IOMMU (I/O Memory Management Unit) context can be associated to both the VF and its associated user space process address space. This allows the user space device driver instance and the VF I/O hardware, such as the DMA engines, to interact directly and in a secure (contained) way within the address space of the user space process.
- An SR-IOV device driver can be thought as a pair of drivers, one for the PF (Physical Function, a.k.a., “root device”) and one for its children VFs. The advantage of that has to do with software sustainability. One of the goals achieved by the SR-IOV one-level-of-hierarchy device model has been to allow for incremental migration toward the SR-IOV model. For example, in the Linux kernel, support for generic SR-IOV device drivers from a PCIe prospective was incrementally added over a long period of time. Today, it is perfectly in line with the SR-IOV model to reuse the generic SR-IOV support in the kernel for a PF driver instance running in the kernel, while having the VFs’ driver instance, which is used for actual I/O, running in user space. In other words, network latency-bound software applications can easily link to a user space library which provides the optimized VF I/O driver functionality.
Ok, but what about the presence of the VFs on the network?
Well, notwithstanding the common hypervisor case where each VF device has a presence on the network (e.g., its own MAC address), the SR-IOV device model doesn’t require a VF device to have any presence nor identity on the network. A VF device can be anonymous, or invisible, on the network.
One possible arrangement — which we use in the Cisco usNIC solution — is for the PF to have its own presence on the network (e.g., “eth1 with MAC address 11:22:33:44:55:66 and IP address 10.0.0.1″) while the children VFs inherit (or share) the network interface associated to the PF. Note, this scheme doesn’t require bonding/teaming nor any particular intervention of the kernel networking stack.
In summary, SR-IOV contributes to at least three key elements of sustainability in a modern approach to user space networking for HPC applications deployed on common servers, operating systems and networks:
- Seamless integration with the IOMMU (which is part of all modern chipsets, x86 and otherwise)
- Enabling user space I/O drivers
- The non-proliferation of Ethernet or IP network endpoints