Can I MPI_SEND (and MPI_RECV) with a count larger than 2 billion?
This question is inspired by the fact that the “count” parameter to MPI_SEND and MPI_RECV (and friends) is an “int” in C, which is typically a signed 4-byte integer, meaning that its largest positive value is 231, or about 2 billion.
However, this is the wrong question.
The right question is: can MPI send and receive messages with more than 2 billion elements?
The answer is: yes.
You may have heard that MPI-3.0 defined a new “MPI_Count” type that is larger than an “int”, and is therefore not susceptible to the “2 billion limitation.”
But MPI-3.0 didn’t change the type of the “count” parameter in MPI_SEND (and friends) from “int” to “MPI_Count”. That would have been a total backwards compatibility nightmare.
First, let me describe the two common workarounds to get around the “2 billion limitation”:
1. If an application wants to send 8 billion elements to a peer, just send multiple messages. The overhead difference between a single 8GB message and sending four 2GB messages is so vanishingly small that it effectively doesn’t matter.
2. Use MPI_TYPE_CONTIGUOUS to create a datatype comprised of multiple elements, and then send multiple of those — creating a multiplicative effect. For example:
MPI_Type_contiguous(2147483648, MPI_INT, &two_billion_ints_type);
/* Send 4 billion integers */
MPI_Send(buf, 2, two_billion_ints_type, …);
These workarounds aren’t universally applicable, however.
The most-often cited case where these workarounds don’t apply is that MPI-IO-based libraries are interested in knowing exactly how many bytes are written to disk — not types (e.g., check the MPI_Status output from an MPI_FILE_IWRITE call).
For example: a disk may write 17 bytes. That would be 4.25 MPI_INT’s (assuming sizeof(int) == 4).
See these two other blog posts for more details, to include how the MPI_Count type introduced in MPI-3.0 is used to solve these kinds of issues: