Traffic. I find myself still thinking about my last entry today as I’m riding the blue line CTA from O’Hare airport to downtown Chicago for the MPI Forum meeting this afternoon. Here I am, being spirited downtown at a steady clip on a commuter train while I see thousands of gridlocked cars on one side of me, and easily flowing motor vehicles on the other. I will definitely reach downtown before the majority of vehicles that are only a few feet away from me on the Kennedy expressway, despite the fact that I’m quite sure that I left O’Hare long after they did.
Traffic is such a great network metaphor that is gives insight into today’s ramble: it’s well-understood that network packets may be delivered in a different order than which they were sent. What’s less understood is why.
First, let’s clear a common misconception: there is a critical difference between MPI’s matching and incoming message and delivering that message to the application. Matching point-to-point MPI messages deals with examining the communicator, tag, and source of a message. Delivering the message means fully receiving the message into the application’s target buffer and then notifying the application that it has done so.
MPI only provides ordering guarantees about matching incoming messages — not delivering them. Why?
Let’s answer by means of an example — here’s a not-uncommon case: the delivery of a short message may overtake that of a long message. Consider the following:
- Sender A sends a long message to peer X on communicator Y with tag Z
- Sender A then immediately sends a short message to peer X on communicator Y with tag Z
Which message arrives first?
That’s actually a loaded question because it’s an ill-formed question. The MPI specification doesn’t define which message arrives first. It defines which message is matched first at the receiver: the first one (which happens to be the long one). Specifically, between a pair of peers, MPI defines that messages sent on the same communicator and tag will be matched at the receiver in the same relative order. Other messages may arrive in between the two messages in the above example… but that’s a different story for a different time.
In the above example, the sender may have chosen to fragment the long message into multiple chunks (for a variety of reasons which I won’t cover today) — it may have sent the (X, Y, Z) info to the receiver along with the first part of the long message. Further, the sender may have chosen not to fragment the short message — it may have sent the (X, Y, Z) of the second (short) message and the entire short message in one network fragment.
In this way, the receiver may match both messages in order, but deliver the short message before the long.
It may seem counter-intuitive, but overtaking issues like the one described above can help effect good overall network throughput. Letting some network fragments be delivered out of order can have some wide-reaching effects (both good and bad), but for the purposes of this blog entry, let’s just note that letting a few small messages be delivered during long message receipt is a Good Thing(tm).
Put differently: the needs of the many outweigh the needs of the few.