Cisco Blogs

Cisco Blog > High Performance Computing Networking

“RDMA” — what does it mean to MPI applications?

RDMA standard for Remote Direct Memory Access.  The acronym is typically associated with OpenFabrics networks such as iWARP, IBoIP (a.k.a. RoCE), and InfiniBand.  But “RDMA” is typically just today’s popular flavor du jour of a more general concept: RMA (remote memory access), or directly reading and writing to a peer’s memory space.

RMA implementations (including RDMA-based networks, such as OpenFabrics) typically include one or more of the following technologies:

  1. Operating system bypass: userspace applications directly communicate with network hardware.
  2. Hardware offload: network activity is driven by the NIC, not the main CPU
  3. Hardware or software notification: when messages finish sending or are received

How are these technologies typically used in MPI implementations?

Read More »

Tags: , ,

Unexpected messages = evil

Another term that is not-infrequently used when discussing message passing application is “unexpected messages.”

What are they, and why are they (usually) bad?

The quick definition is that an unexpected message is one that arrives before a corresponding MPI receive has been posted.  In more concrete terms: an MPI process has sent a message to a process that hadn’t yet called some flavor of MPI_RECV to receive the message.

Why is this a Bad Thing?

Read More »

Tags: , , ,

“Eager Limits”, part 2

Open MPI actually has multiple different protocols for sending messages — not just eager / rendezvous.

Our protocols were originally founded on the ideas described in this paper.  Many things have changed since that 2004 paper, but some of the core ideas are still the same.

The picture to the right shows how Open MPI divides an MPI message up into segments and sends them in three phases.  Open MPI’s specific definition of the “eager limit” is the max payload size that is sent with MPI match information to the receiver as the first part of the transfer.  If the entire message fits in the eager limit, no further transfers / no CTS is needed.

Read More »

Tags: , ,

What is an MPI “eager limit”?

Technically speaking, the MPI standard does not define anything called an “eager limit.”

An “eager limit” is term used to describe a method of sending short messages used by many MPI implementations.  That is, it’s an implementation technique — it’s not part of the MPI standard at all.  And since it’s not standardized, it also tends to be different in each MPI implementation.  More specifically: if you write your MPI code to rely on a specific implementation’s “eager limit” behavior, your code may not perform well (or may even deadlock!) with other MPI implementations.

So — what exactly is an “eager limit”?

Read More »

Tags: ,

A bucket full of new MPI Fortran features

Over this past weekend, I had the motivation and time to overhaul Open MPI’s Fortran support for the better.  Points worth noting:

  • The “use mpi” module now includes all MPI subroutines.  Strict type checking for everything!
  • Open MPI now only uses a single Fortran compiler — there’s no more artificial division between “f77″ and “f90″

There’s still work to be done, of course (this is still off in a Mercurial bitbucket repo — not in the Open MPI main line SVN trunk yet), but the results of this weekend code sprint are significantly simpler Open MPI Fortran plumbing behind the scenes and a much, much better implementation of the MPI-2 “use mpi” Fortran bindings.

Read More »

Tags: , , ,