Avatar

After years of discussion, the upcoming release of Open MPI 1.7.4 will change how processes are laid out (“mapped”) and bound by default.  Here’s the specifics:

  • If the number of processes is <= 2, processes will be mapped by core
  • If the number of processes is > 2, processes will be mapped by socket
  • Processes will be bound to core
  • MPI_COMM_WORLD ranks will be assigned by slot

These are all the default values — they, of course, can be changed by the user via mpirun CLI options, environment variables, etc.

Why did we (finally) make this change?

Two main reasons:

  1. Invoking processor affinity by default is definitely helpful to many kinds of applications — especially benchmarks (yes, I’m being quite blunt here).
  2. Other MPI implementations bind by default, and then use that to bash Open MPI’s “out of the box” performance.

Enabling processor affinity is beneficial because OS’s — including modern Linux — still don’t do a great job of not moving processes around, even in steady state.  That is: there is typically a small-but-noticeable difference between binding MPI processes and not binding them.  Indeed, in some cases, there’s a large noticeable difference.

The catch, however, is that only the MPI application knows what mapping, MPI_COMM_WORLD, and binding patterns are best for it.

There definitely seems to be a spectrum of possibilities and applications:

  • Doing nothing is definitely not harmful to any application, but it can leave performance on the table, so to speak
  • 2-process MPI jobs tend to be benchmarks, and the (bind-to-core, map-by-core) pattern tends to be good for them
  • >2-process MPI jobs tend to be real applications, and can benefit from the greater memory bandwidth availability of (bind-to-core + map-by-socket)

Are these patterns right for all apps?  Definitely not.

For example, these patterns are definitely not good for apps that only use half the cores in a server because of memory bandwidth constraints.  However, we’re told that users who need this pattern already set their own mapping, binding, and ordering patterns via mpirun CLI options.  So these new defaults won’t affect them at all.

That being said, the two default patterns we’re using tend to leave less performance on the table than doing nothing.  So we’ll see how they work out in the real world.

Let us know what you think.

  • Do these patterns work for you?  Why or why not?
  • Do you even care?  I.e., do you already set your own mapping, binding, and/or ordering?


Authors

Jeff Squyres

The MPI Guy

UCS Platform Software