In my last post, I talked about the so-called eager RDMA optimization, and its effects on resource consumption vs. latency optimization.
Let’s talk about another optimization: shared receive queues.
Shared receive queues are not a new idea, and certainly not exclusive to MPI implementations. They’re a way for multiple senders to send to a single receiver while only consuming resources from a common pool.
A “receive queue” is pretty much what it sounds like: a queue of incoming messages. Queues consume resources; each incoming message that is placed in the receive queue takes up some RAM (and therefore also OS virtual memory meta data), and possibly even NIC resources. As the receiving processes dequeue messages, that memory is either released or, more likely, recycled.
With shared receive queues, instead of having N available buffers for each of M senders, you have only R buffers, where R < M x N (or even R << M x N). The idea behind having less than MxN buffers available is that not all senders will be continuously sending all the time — hence, MxN buffers is overkill (and wasteful).
It’s worth clarifying that there are (at least) two relevant variables involved in the receive queue abstraction: sender uniqification and memory allocation.
Ok, I made up “uniqification”, but see if you can use it in conversation today. Let’s start something.
Sender uniqification is whether a single receive queue will receive from a single sender or multiple senders. TCP sockets, for example, pair a single sender and a single receiver. UDP socket endpoints accept incoming messages from multiple senders.
Memory allocation has to do with where the memory comes from for incoming messages. Does the queue have its own private memory for incoming messages? Is it shared with others?
The first image on the right shows three receive queues; each RQ has its own unique sender and has its own set of orange buffers for incoming messages. The second image shows RQs with unique senders, but they all draw upon a common set of incoming message buffers. The third image shows a single RQ with its own set of buffers.
In the first two images, the receiver can tell who the sender is by which RQ a message comes from. In the third image, messages must be marked — either by the network API or by the contents of the message itself — as to who they are from.
With all that being said, MPI implementations typically use some form of “less than MxN receive buffers” when possible — either by using some kind of receive queues, or perhaps closing down per-peer connection resources when they have not been used for a while.