HPC over UDP
A few months ago, I posted an entry entitled “HPC in L3“. My only point for that entry was to remove the “HPC in L3? That’s a terrible idea!” knee-jerk reaction that us old-timer HPC types have.
I mention this because we released a free software update a few days ago for the Cisco usNIC product that enables usNIC traffic to flow across UDP (vs. raw L2 frames). Woo hoo!
That’s right, sports fans: another free software update to make usNIC even better than ever. Especially across 40Gb interfaces!
Here’s some common questions we get about the UDP usNIC transport:
- Why did you do this? We updated usNIC to use UDP for all of the reasons that I described in my previous blog entry. For example, one of our most common customer issues is that they have a fragmented IP address space in their data center. Two racks in an HPC cluster may be on entirely different IP subnets (e.g., 10.10.10.x/24 and 10.10.20.x/24).
- Do usNIC and the Linux IP stack share a common physical interface? Yes, both usNIC traffic and Linux IP stack traffic flow across the same physical interfaces. Put differently: usNIC is simply an ultra-low-latency way to get to the wire from userspace (by bypassing the Linux IP stack).
- Do usNIC and the Linux IP stack share a common IP address? Yes. Think of it this way: you configure a usNIC device by configuring its corresponding Linux Ethernet interface.
- How does receive-side ingress steering work? The UDP port number is used to determine whether an incoming packet is for a usNIC receive queue or the Linux IP stack.
- Won’t you get UDP port number collisions? No. The usNIC stack reserves UDP port numbers to ensure that the same port number is never in use by both usNIC and the Linux IP stack.
- Can I run normal IP traffic over an interface used by usNIC? Yes. That is, the ethX interface corresponding to a VIC interface can also have a usnic_Y interface which basically provides an additional ultra-low-latency path to userspace queues.
- Can I QoS Linux IP and/or usNIC traffic? Yes. usNIC traffic is IP traffic, and will obey all the same IP policies as the rest of your network traffic, such as VLANs, QoS, routing tables, etc.
- Are you sure? Yes. 100% sure.
For example, the usNIC support in Open MPI now examines things like the Linux IP routing tables to determine which interfaces can talk to which peers (which gets really fun/interesting in multi-rail scenarios!).
Cisco usNIC customers can upgrade to this new transport by navigating to the “Support” tab on cisco.com and search Downloads for C240 (or C220 — both will get you to the same place). Select the “Unified Compute System (UCS) Drivers” link. Starting with version 2.0.3, the UDP-enabled usNIC drivers can be found in the Linux/Network/Cisco/12x5x tree.
Running the usNIC installer script will update three things:
- The enic Linux kernel driver. This is the normal Ethernet driver for the Cisco VIC.
- The usnic_verbs Linux kernel driver. This is the kernel side of usNIC.
- The libusnic_verbs Linux userspace driver. This is the userspace side of usNIC.
It also installs Open MPI 1.8.2 under /opt/cisco/openmpi, but you could also download that from upstream — it’s exactly the same.
All this being said, note that Open MPI 1.8.2 actually supports both the UDP and L2 versions of usNIC — it figures out what you have at run time and automatically adjusts itself accordingly. This de-couples the upgrade cycle: you can upgrade Open MPI whenever you want, and you can upgrade the usNIC Linux drivers whenever you want.