I’ve seen many users make lots of different kinds of MPI programming mistakes.
Some are common, newbie types of mistakes. Others are common intermediate-level mistakes. Others are incredibly subtle programming mistakes in deep logic that took sophisticated debugging tools to figure out (race conditions, memory overflowing, etc.).
In 2007, I wrote a pair of magazine columns listing 10 common MPI programming mistakes (see this PDF for part 1 and this PDF for part 2). Indeed, we still see users asking about some of these mistakes on the Open MPI user’s mailing list.
What mistakes do you see your users making with MPI? How can we — the MPI community — better educate users to avoid these kinds of common mistakes? Post your thoughts in the comments.
Here’s the 10 mistakes I listed in the linked PDFs:
- Inconsistent environment / “dot” files.
- Orphaning MPI requests
- Mixing Fortran (and C++) compilers
- Blaming MPI for programmer errors
- Re-using a buffer prematurely
- Mixing MPI implementations
- Assuming MPI_SEND will (not) block
I’d say that most of these are still relevant. Some of them aren’t even directly related to MPI; some are about the parallel run-time environment, others are about logic or algorithm errors in applications that just happen to use MPI (i.e., the same error would have occurred using, for example, TCP sockets or shared memory as a communication medium).
What’s your favorite newbie MPI mistake? What’s the gnarliest MPI error that you’ve ever had to track down?