It’s the latency, stupid

May 28, 2010 - 1 Comment

…except when it isn’t.

Most people throw around latency and bandwidth numbers as the most important metrics for a given MPI implementation.  “MPI implementation X is terrible because MPI implementation Y’s latency is 5% lower!”

Ahh… the fervence of youth (and marketing).  If only the world was so black and white.  But it’s not.  The world is grey.  I can think of 20 metrics and implementation features off the top of my head that matter to real-world users and applications.

This list is by no means complete; they’re actually pretty much off the top of my head and in no particular order.  I also specifically mention “servers” in the items a few times; so let’s just restrict this specific list to cluster-based MPI implementations — even though most of the concepts apply to all MPI implementations.

  1. How well does MPI implementation X (hiterto “MIX”) manage its resources? 
    • Does it trade space for performance — or vice versa? 
    • Does it scale to the number of servers / MPI processes that you need to run?
  2. Bandwidth and latency are oft-cited in terms of traditional network performance (e.g., across TCP sockets, OpenFabrics-based networks, etc.).  But how well does MIX perform across shared memory (i.e., communications between processes on the same server)?
  3. How well does a MIX process perform when communicating with both a shared memory peer and a peer across a traditional network?
  4. How well does MIX’s collective operations perform, particularly when combined with peers both on the same server and on different servers?
  5. How well does MIX overlap the communication of large messages?
  6. Does MIX make asynchronous progress on large message transmission and receipt?  If so, how?  Will it steal cycles and/or thrash caches of your MPI application?
  7. How well does MIX clean up after itself?  For example, what happens if an MPI application aborts, and/or if a user hits ctrl-C?
  8. How easy/difficult is it to compile MPI applications?  Do you just -lmpi, transparently use a wrapper compiler, or …?
  9. What run-time features does mpirun / mpiexec support?  Can you redirect standard input, output, error, etc.?
  10. When you start an MPI application across multiple servers:
    • How long does it take?
    • Does it scale to the number of servers / MPI processes that you need to run?
    • Does it use rsh/ssh, or does it use your resource manager’s startup mechanism?
    • Does it tie into your enterprise authentication system?
    • Does it record accounting statistics for the overall parallel run?
  11. How good is the support for MIX?
    • Is there documentation available?
    • Is there support available for your MPI application’s interaction with MIX?
    • Is there support available for the MPI implementation itself?
  12. Does MIX support a constant ABI such that you can upgrade MIX without recompiling / relinking your application?
  13. Does MIX support all the network types, platforms, and operating systems that you care about?  Or do you have to install different MPI implementations on each platform in your machine room?
  14. Does MIX support run-time tuning, or do you need to re-compile your MPI application (or MIX itself) to change the behavior of MIX?  If it is run-time tweakable, can the system administration set and maintain defaults for all users?
  15. Is the MIX still being developed and/or supported?  (Author’s note: please please please stop using LAM/MPI — we haven’t done any work on it in several years!)
  16. Does MIX support interactive parallel debugging and/or other development tools?
  17. How good / accurate / understandable are MIX’s error messages when something goes wrong?
  18. Every MPI implementation claims to be fully MPI-complaint.  But how close to actual compliance is it?
  19. How do complex MPI datatypes affect sending and receiving performance?
  20. Is the MPMD model supported (i.e., launching a combination of executables in a single MPI_COMM_WORLD)?

As I’ve mentioned before, the metrics that matter are the ones that are most important to you and your application(s).  Run your apps in multiple different environments with different setups, MPI implementations, and hardware.  See what performs best for you and what metrics matter most to you. 

The 2009 MPI Forum survey indicated that most respondents care about performance as their top priority… but not everyone.  Others would trade performance for resilience.  Still others prefer run-time features.

What do you prefer?

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.


  1. Latency benchmarks contain the same amount of information as the tests they run: 0 bytes :)I, too, favor an MPI decathlon, rather than 100-meter dash plus marathon, as a means of qualifying performance.