SC’11 Cisco booth demo: Open MPI over Linux VFIO
Linux VFIO (Virtual Function IO) is an emerging technology that allows direct access to PCI devices from userspace. Although primarily designed as a hypervisor-bypass technology for virtualization uses, it can also be used in an HPC context.
Think of it this way: hypervisor bypass is somewhat similar to operating system (OS) bypass. And OS bypass is a characteristic sought in many HPC low-latency networks these days.
Drop by the Cisco SC’11 booth (#1317) where we’ll be showing a technology preview demo of Open MPI utilizing Linux VFIO over the Cisco “Palo” family of first-generation hardware virtualized NICs (specifically, the P81E PCI form factor). VIFO + hardware virtualized NICs allow benefits such as:
- Low HRT ping-pong latencies over Ethernet via direct access to L2 from userspace (4.88us)
- Hardware steerage of inbound and outbound traffic to individual MPI processes
Let’s dive into these technologies a bit and explain how they benefit MPI.
The Cisco Palo NICs are incredibly cool for multiple reasons; the HPC-relevant reasons include:
- Palo can present itself as up to 128 “virtual” PCI devices to the server
- The switching to these 128 devices is done in hardware (not software!)
If you pair the concept of hardware-virtualized NICs with VFIO, not only can you access each of these virtual NICs from Linux userspace (e.g., MPI processes), you can give each MPI process a unique L2 address and have hardware control inbound and outbound steering, flow control, buffering, etc.
In the Cisco SC’11 booth, we’re showing a development demo of Open MPI utilizing technology built upon Linux VFIO to do exactly that.
Specifically: each Open MPI process has direct access to read and write L2 Ethernet frames from Linux userspace, offloading all the checksums, routing, etc. to the hardware.
This is essentially OS bypass.
Not only does this tremendously cut down on latency (by avoiding the entire TCP and/or UDP stacks), it also offloads routing to individual MPI processes to hardware.
Stop by the Cisco booth to see an early development version of this Open MPI port in action.
Finally, note that Palo is only Cisco’s first-generation hardware virtualized NIC. Stay tuned for even better performance with our second-generation NIC…