MPI_Ibarrier: Crazy?

September 15, 2012 - 4 Comments

Most people’s reactions to hearing about the new MPI-3 non-blocking “barrier” collective think: huh?

Why on earth would you have a non-blocking barrier?  The whole point of a barrier is to synchronize — how does it make sense not to block while waiting?

The key is re-phrasing that previous question: why would you block while waiting?

Blocking during a barrier is just like blocking during any other operation: you can’t do any useful work while waiting.

Here’s an analogy: a barrier is very much like the start of a meeting.  You can’t start the meeting until all the participants of the meeting arrive.  And there’s always That Guy who’s 15 minutes late (yeah, we all know That Guy).

Ignoring all social issues for the sake of my example, you certainly don’t have to sit there, waiting, doing nothing until That Guy shows up.  Instead, you can do some other work while waiting — presumably work that is either unrelated to the meeting, or at least not dependent upon the start of the meeting.

When That Guy finally shows up and the meeting attendance is complete, then you can start the meeting.  But you might as well get other useful things done while waiting.

The analogy here is pretty obvious: “That Guy” who is habitually 15 minutes late can be a slow process in your MPI job.  Why wait for that slow process?  Other MPI processes can continue to do useful work while waiting, even if there’s some types of work that you can’t do until you know that every other process has reached a certain point.

That’s what non-blocking barriers (and other non-blocking operations) are for.  Think of non-blocking barriers as a notification mechanism that everyone has reached (or passed) a common milestone.

Why wait?

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.


  1. The new feature seems a good utility but won’t a process which is slow (the 15 late guy) impact the overall parallelism of the application.

    The parallelism of the entire application will then de dependant on that slow process making the entire application run slower.

    • Possibly. But that somewhat assumes a tightly-coupled MPI application.

      What if the load is non-uniform across the MPI processes? Having such a “fuzzy” barrier would still allow other processes to do useful things while the slow process is still chunking away, trying to reach the milestone that everyone else has already reached.

      Don’t get me wrong — I’m not saying that a fuzzy barrier is useful everywhere. Heck, barriers aren’t *needed* in many places (the class of MPI applications that *need* a blocking barrier for correctness is very small).

      But fuzzy barriers are a different animal. If you think of them as milestone markers — especially in applications with lots of milestones, each of which have varying degrees of dependence on others (ranging from wholly dependent to fully independent) — then they can be a useful tool to extract more communication / computation overlap.

  2. Yep, this routine comes in very handy in cases where the pattern of messages is unknown. I asked for it explicitly a while ago without knowing it existed, then had to implement one myself:

  3. See Algorithm 2 of this paper for a practical and communication-optimal use of nonblocking barrier to set up sparse communication.