Half-round-trip ping-pong latency may be the first metric that everyone looks at with MPI in HPC, but bandwidth is one of the next metrics examined.

40Gbps Ethernet has been available for switch-to-switch links for quite a while, and 40Gbps NICs are starting to make their way down to the host.

How does MPI perform with a 40Gbps NIC?

The graph below shows the latency and bandwidth results of a NetPIPE 3.7.1 run with Open MPI 1.8.1 over a pair of Cisco UCS C240 M3 servers (with Intel Xeon E5-2690 (v1) “Sandy Bridge” chips at 2.9Ghz), each with a 2x40Gbps Cisco 1285 VIC using the Cisco low-latency usNIC driver stack, connected back-to-back.

Click to see full size
Click to see a scalable PDF version

You can see that the HRT PP latency starts at 1.93us, and the bandwidth reaches 37.23Gbps.