Cisco Blogs


Cisco Blog > High Performance Computing Networking

Open MPI: behind the scenes

Open MPI logoWorking on an MPI implementation isn’t always sexy.  There’s a lot of grubby, grubby work that needs to happen on a continual basis to produce a production-quality MPI implementation that can be used for real-world HPC applications.

Sure, we always need to work on optimizing short message latency.

Sure, we need to keep driving MPI’s internal resource utilization down so that apps get more use of hardware.

But there’s also lots of “uninteresting” — yet still critically important — stuff that happens behind the scenes. Read More »

Tags: , ,

Tree-based launch in Open MPI (part 2)

In my prior blog entry, I described the basics of Open MPI’s tree-based launching system over ssh (yes, there are still some valid / good reasons for using ssh over a native job scheduler / resource manager’s parallel launch mechanisms…).

That entry got a little long, so I split the rest of the discussion into a separate blog entry.

The prior entry ended after describing that Open MPI uses a binomial tree-based launcher.

Read More »

Tags: , ,

Tree-based launch in Open MPI

I’ve mentioned it before: the run-time systems of MPI implementations are frequently unsung heroes.

A lot of blood, sweat, tears, and innovation goes into parallel run time systems, particularly those that can scale to very large systems.  But they’re  not discussed often, mainly because they’re not as sexy and ultra-low latency numbers, or other popular MPI benchmarks.

Here’s one cool thing that we added to the runtime in Open MPI a few years ago, and have continued to improve on over the years (including pretty pictures!).

Read More »

Tags: , ,

The “vader” shared memory transport in Open MPI: Now featuring 3 flavors of zero copy!

Today’s blog post is by Nathan Hjelm, a Research Scientist at Los Alamos National Laboratory, and a core developer on the Open MPI project.

The latest version of the “vader” shared memory Byte Transport Layer (BTL) in the upcoming Open MPI v1.8.4 release is bringing better small message latency and improved support for “zero-copy” transfers.

NOTE: “zero copy” in the term typically used, even though it really means “single copy” (copy the message from the sender to the receiver).  Think of it as “zero extra copies, i.e., one copy instead of two.”

Read More »

Tags: , , , ,

MPI over 40Gb Ethernet

Half-round-trip ping-pong latency may be the first metric that everyone looks at with MPI in HPC, but bandwidth is one of the next metrics examined.

40Gbps Ethernet has been available for switch-to-switch links for quite a while, and 40Gbps NICs are starting to make their way down to the host.

How does MPI perform with a 40Gbps NIC?

Read More »

Tags: , , ,