NUMA

The “vader” shared memory transport in Open MPI: Now featuring 3 flavors of zero copy!

3 min read

Today’s blog post is by Nathan Hjelm, a Research Scientist at Los Alamos National Laboratory, and a core developer on the Open MPI project. The latest version of the “vader” shared memory Byte Transport Layer (BTL) in the upcoming Open MPI v1.8.4 release is bringing better small message latency and improved support for “zero-copy” transfers. NOTE: “zero copy” […]

Process affinity: Hop on the bus, Gus!

7 min read

Today’s blog post is written by Joshua Ladd, Open MPI developer and HPC Algorithms Engineer at Mellanox Technologies. At some point in the process of pondering this blog post I noticed that my subconscious had, much to my annoyance, registered a snippet of the chorus to Paul Simon’s timeless classic “50 Ways to Leave Your […]

Open MPI: Binding to core by default

2 min read

After years of discussion, the upcoming release of Open MPI 1.7.4 will change how processes are laid out (“mapped”) and bound by default.  Here’s the specifics: If the number of processes is <= 2, processes will be mapped by core If the number of processes is > 2, processes will be mapped by socket Processes […]

How many network links do you have for MPI traffic?

2 min read

If you’re a bargain basement HPC user, you might well scoff at the idea of having more than one network interface for your MPI traffic. “I’ve got (insert your favorite high bandwidth network name here)! That’s plenty to serve all my cores! Why would I need more than that?” I can think of (at least) […]

Latency Analogies (part 2)

2 min read

In a prior blog post, I talked about latency analogies.  I compared levels of latencies to your home, your neighborhood, a far-away neighborhood, and another city.  I talked about these localities in terms of communication. Let’s extend that analogy to talk about data locality.

Latency Analogies

1 min read

Multiple readers have told me that it is difficult for them to understand and/or visualize the effects of latency on their HPC applications, particularly in modern NUMA (non-uniform memory access) and NUNA (non-uniform network access) environments. Let’s breaks down the different levels of latency in a typical modern server and network computing environments.

Process and memory affinity: why do you care?

3 min read

I’ve written about NUMA effects and process affinity on this blog lots of times in the past.  It’s a complex topic that has a lot of real-world affects on your MPI and HPC applications.  If you’re not using processor and memory affinity, you’re likely experiencing performance degradation without even realizing it. In short: If you’re not booting […]