What is an MPI “eager limit”?
Technically speaking, the MPI standard does not define anything called an “eager limit.”
An “eager limit” is term used to describe a method of sending short messages used by many MPI implementations. That is, it’s an implementation technique — it’s not part of the MPI standard at all. And since it’s not standardized, it also tends to be different in each MPI implementation. More specifically: if you write your MPI code to rely on a specific implementation’s “eager limit” behavior, your code may not perform well (or may even deadlock!) with other MPI implementations.
So — what exactly is an “eager limit”?
MPI implementations typically use (at least) two algorithms for sending messages between peer processes:
- Eager: messages, including MPI match data, are sent as soon as possible. Typically used for “short” messages.
- Rendezvous: a class of protocols based on “request to send” / “clear to send” (RTS/CTS) techniques. Typically used for “long” messages.
CTS/RTS protocols are based on the idea that the sender sends the MPI match information to the receiver (i.e., the RTS). Once the receiver decides that it wants to receive the message, it sends back “ok, you can send now” (i.e., the CTS). The sender then transfers the bulk of the message payload to the receiver.
“Eager” messages typically avoid this CTS/RTS round-trip overhead by just sending immediately, with little or no permission from the receiver. As such, eager protocols are useful when latency is important. For example, send a 4-byte MPI message eagerly across InfiniBand can take about 1.5us. If we add in a round-trip CTS/RTS to that cost, the total would be 4.5us. Ouch!
Rendezvous protocols are typically used whem resource consumption — not latency — is important. For example, if a sender sends a 10MB message to a receiver, the receiver might not want to actually get the message until the application has posted a corresponding MPI_RECV, meaning that there is now a buffer available to receive that 10MB message.
Conversely, if sender sender sent the 10MB message eagerly, the receiver might have to a) allocate a temporary 10MB buffer to hold it, and b) copy the message to the target buffer when the corresponding MPI_RECV gets posted. Double ouch!
My next blog entry will talk about how Open MPI uses eager and rendezvous protocols.