MPI over 40Gb Ethernet

May 10, 2014 - 4 Comments

Half-round-trip ping-pong latency may be the first metric that everyone looks at with MPI in HPC, but bandwidth is one of the next metrics examined.

40Gbps Ethernet has been available for switch-to-switch links for quite a while, and 40Gbps NICs are starting to make their way down to the host.

How does MPI perform with a 40Gbps NIC?

The graph below shows the latency and bandwidth results of a NetPIPE 3.7.1 run with Open MPI 1.8.1 over a pair of Cisco UCS C240 M3 servers (with Intel Xeon E5-2690 (v1) “Sandy Bridge” chips at 2.9Ghz), each with a 2x40Gbps Cisco 1285 VIC using the Cisco low-latency usNIC driver stack, connected back-to-back.

Click to see full size

Click to see a scalable PDF version

You can see that the HRT PP latency starts at 1.93us, and the bandwidth reaches 37.23Gbps.


In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.


  1. How’s the MPI-3 RMA perf?

    • To be honest, I haven’t benchmarked it.

      Who uses that RMA stuff, anyway? 😉

  2. What’s the MTU ? (to compute the actual max data rate)

    • For this graph, the MTU on both interfaces was 9000.

      I should also mention that the underlying transport that the usNIC stack uses is UDP (via operating system bypass — not via the Linux UDP stack).