Cisco Blogs

Cisco Blog > High Performance Computing Networking

MPI tradeoffs: space vs. time

@brockpalen asked me a question in Twitter:

@jsquyres [can you discuss] common #MPI implementation assumptions made for performance and/or resource constraints?

Good question.  MPI implementations are full of trade-offs between performance and resource consumption.  Let’s discuss a few easy ones.

Read More »

Tags: , ,

More MPI-3 newness: const

Way back in the MPI-2.2 timeframe, a proposal was introduced the add the C keyword “const” to all relevant MPI API parameters.  The proposal was discussed at great length.  The main idea was twofold:

  • Provide a stronger semantic statement about which parameter contents MPI could change, and which it should not.  This mainly applies to user choice buffers (e.g., the choice buffer argument in MPI_SEND).
  • Be more friendly to languages that use const(-like constructs) more than C.  The original proposal was actually from Microsoft, whose goal was to provide higher quality C# MPI bindings.

Additionally, the (not deprecated at the time) official MPI C++ bindings have had const since the mid-1990s — so why not include them in the C bindings?

Read More »

Tags: , ,

New things in MPI-3: MPI_Count

The count parameter exists in many MPI API functions: MPI_SEND, MPI_RECV, MPI_TYPE_CREATE_STRUCT, etc.  In conjunction with the datatype parameter, the count parameter is often used to effectively represent the size of a message.  As a concrete example, the language-neutral prototype for MPI_SEND is:

MPI_SEND(buf, count, datatype, dest, tag, comm)

The buf parameter specifies where the message is in the sender’s memory, and the count and datatype arguments indicate its layout (and therefore size).

Since MPI-1, the count parameter has been an integer (int in C, INTEGER in Fortran).  This meant that the largest count you could express in a single function call was 231, or about 2 billion.  Since MPI-1 was introduced in 1994, machines — particularly commodity machines used in parallel computing environments — have grown.  2 billion began to seem like a fairly arbitrary, and sometimes distasteful, limitation.

The MPI Forum just recently passed ticket #265, formally introducing the MPI_Count datatype to alleviate the 2B limitation.

Read More »

Tags: , ,

Shared memory as an MPI transport (part 2)

In my last post, I discussed the rationale for using shared memory as a transport between MPI processes on the same server as opposed to using traditional network API loopback mechanisms.

The two big reasons are:

  1. Short message latency is lower with shared memory.
  2. Using shared memory frees OS and network hardware resources (to include the PCI bus) for off-node communication.

Let’s discuss common ways in which MPI implementations use shared memory as a transport.

Read More »

Tags: , ,

Shared memory as an MPI transport

MPI is a great transport-agnostic inter-process communication (IPC) mechanism.  Regardless of where the peer process is that you’re trying to communicate with, MPI shuttles messages back and forth with it.

Most people think of MPI as communicating across a traditional network: Ethernet, InfiniBand, …etc.  But let’s not forget that MPI is also used between processes on the same server.

A loopback network interface could be used to communicate between them; this would present a nice abstraction to the MPI implementation — all peer processes are connected via the same networking interface (TCP sockets, OpenFabrics verbs, …etc.).

But network loopback interfaces are typically not optimized for communicating between processes on the same server (a.k.a. “loopback” communication). For example, short message latency between MPI processes — a not-unreasonable metric to measure an MPI implementation’s efficiency — may be higher than it could be with a different transport layer.

Shared memory is a fast, efficient mechanism that can be used for IPC between processes on the same server. Let’s examine the rationale for using shared memory and how it is typically used as a message transport layer.

Read More »

Tags: , ,