Cisco Blogs

“RDMA” — what does it mean to MPI applications?

July 16, 2011 - 0 Comments

RDMA standard for Remote Direct Memory Access.  The acronym is typically associated with OpenFabrics networks such as iWARP, IBoIP (a.k.a. RoCE), and InfiniBand.  But “RDMA” is typically just today’s popular flavor du jour of a more general concept: RMA (remote memory access), or directly reading and writing to a peer’s memory space.

RMA implementations (including RDMA-based networks, such as OpenFabrics) typically include one or more of the following technologies:

  1. Operating system bypass: userspace applications directly communicate with network hardware.
  2. Hardware offload: network activity is driven by the NIC, not the main CPU
  3. Hardware or software notification: when messages finish sending or are received

How are these technologies typically used in MPI implementations?

In general, RMA is a pretty lousy match for MPI’s message-based semantics.  Mapping the abstractions provided by RMA concepts to those of two-sided message passing is a complicated process.  Indeed, many of the one-sided RMA benefits can be lost when applying them to two-sided communications.

In theory, RMA should be a good match for MPI-2 one-sided communications.  But it has proven to be quite challenging to create MPI implementation based on many of today’s RMA-based networks — unless that network was specifically designed for MPI traffic.

MPI implementations typically combine using traditional two-sided network communication mechanisms and RMA techniques.  For example, the RMA technologies listed above are typically used in the following ways:

  • OS bypass reduces short MPI message latency.  When implemented well, half-round trip ping-pong latency (while traversing a network) can be on the order of 1 microsecond.
  • Hardware offload is useful for “long” network actions: let the MPI application have the CPU back while the actual message transfer is handled by the networking hardware in the background.
  • Instead of having to check every MPI request for completion, the network can simply tell an MPI implementation when each one is done.

However, as with all engineering, the uses of these technologies come with associated trade offs.

Stay tuned — in my next post, I’ll talk about an important RMA consequence and its impact on MPI implementations: registered memory.


In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.