Cisco Blogs


Cisco Blog > High Performance Computing Networking

MPI Programming Mistakes

February 25, 2011
at 7:30 am PST

Mistakes

I’ve seen many users make lots of different kinds of MPI programming mistakes.

Some are common, newbie types of mistakes.  Others are common intermediate-level mistakes.  Others are incredibly subtle programming mistakes in deep logic that took sophisticated debugging tools to figure out (race conditions, memory overflowing, etc.).

In 2007, I wrote a pair of magazine columns listing 10 common MPI programming mistakes (see this PDF for part 1 and this PDF for part 2).  Indeed, we still see users asking about some of these mistakes on the Open MPI user’s mailing list.

What mistakes do you see your users making with MPI?  How can we — the MPI community — better educate users to avoid these kinds of common mistakes?  Post your thoughts in the comments.

Here’s the 10 mistakes I listed in the linked PDFs:

  1. Inconsistent environment / “dot” files.
  2. Orphaning MPI requests
  3. MPI_PROBE
  4. Mixing Fortran (and C++) compilers
  5. Blaming MPI for programmer errors
  6. Re-using a buffer prematurely
  7. Mixing MPI implementations
  8. MPI_ANY_SOURCE
  9. Serialization
  10. Assuming MPI_SEND will (not) block

I’d say that most of these are still relevant.  Some of them aren’t even directly related to MPI; some are about the parallel run-time environment, others are about logic or algorithm errors in applications that just happen to use MPI (i.e., the same error would have occurred using, for example, TCP sockets or shared memory as a communication medium).

What’s your favorite newbie MPI mistake?  What’s the gnarliest MPI error that you’ve ever had to track down?

Tags: , ,

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.

4 Comments.


  1. Significant subsets of (5) are often detected with padb (http://www.pittman.co.uk/padb/), indeed it is my favourite us of the tool. It can also be helpful for detecting subsets of the other problems.

    I don’t see a good reason why some magical tool couldn’t detect many cases of mismatched compilers, or mismatched linking etc.

       0 likes

    • Jeff Squyres

      Unfortunately, it’s my experience that compilers themselves are the ones that detect mismatches and produce error messages that the common user can’t decipher (rather that something simple like “It looks like your middleware was compiled with compiler XYZ, but you’re compiling your application with compiler ABC. Unfortunately, XYZ and ABC are not compatible…”

      I’m not saying that this would be an easy task to make such user-friendly error messages, but I agree — it sure would be nice! :-)

         0 likes

      • Ah, we used to see several cases where it built fine but caused subtle (or less subtle) runtime issues. A tool to detect that should have been possible, but would require not inconsiderable test matrixes and ongoing support. Without actually trying to write it, I’ve not idea what the false positive rate would be.

           0 likes

        • Jeff Squyres

          Interesting.

          What kinds of problems did you see? I.e., it might be possible just to test for that problematic behavior, rather than checking for specific know “good” (or “bad”) vendor/version tuples.

             0 likes