Cisco Blogs

Cisco Blog > High Performance Computing Networking

Registered memory imbalances

In prior blog posts, I’ve talked about the implications of registered memory for both MPI applications and implementations.

Here’s another fun implication that was discovered within the last few months by Nathan Hjelm and Samuel Gutierrez out at Los Alamos National Labs: registered memory imbalances.

As an interesting side note: as far as we can tell, no other MPI implementation attempts to either balance registered memory between MPI processes, or handle the performance implications that occur with grossly imbalanced registered memory consumption.

Let’s review a few key points before defining what registered memory imbalances are.

Read More »

Tags: , , ,

Do you read MPI error messages?

A friend approached me the other day asking what this Open MPI error message meant:

Memory <address> cannot be freed from the registration cache. Possible memory corruption.

Open MPI was displaying this message late in the application’s run — it was a pretty safe bet that when the message was printed, it was well after the actual error. “Memory corruption” is a word that sends shivers down developers’ spines.

The message itself is, unfortunately, not very helpful.  It turns out that this message is from Open MPI 1.4, which is now the prior production series of Open MPI.  We’ve recently deprecated it by releasing Open MPI v1.6.  And in v1.6, we replaced this error message with something much more helpful.

Read More »

Tags: ,

New SGE blog / Cisco Live! ticket contest

Long time Open MPI mailing list contributor and Open Source Grid Engine (OGE, previously known as SGE) maintainer Rayson Ho has just opened up a blog about Open Grid Engine kinds of things.

In one of his first posts, he’s giving away a free Cisco Live! pass for the June 10-14, 2012 event.  All you have to do is answer a “simple” MPI question (well, it might not be as simple as it looks :-) ).

As of yesterday, no one had answered the question correctly, so it’s still up for grabs!

Tags: , ,

MPI-3 voting: results

Last March’s MPI Forum meeting was the last meeting to get a “formal reading” of proposals into MPI-3. Some were quite controversial. Some ended up being withdrawn before the next meeting.

This week’s Forum meeting in Japan saw the first vote (out of two) for each the surviving proposals from the March meeting (see the full voting results here). Some continued to be quite controversial. Some didn’t survive their first votes (doh!). Others narrowly survived.

Here’s a summary of some of the users-will-care-about-these proposals, and how they fared: Read More »

Tags: , ,

Followup to “The Common Communications Interface (CCI)”

A few people have made remarks to me about the pair of CCI guest blog entries from Scott Atchley of Oak Ridge (entry 1, entry 2) indicating that they didn’t quite “get it”.  So let me try to put Scott’s words in concrete terms…

CCI is an API that represents a unification of low-level “native” network APIs.  Specifically: many network vendors are doing essentially the same things in their low-level “native” network APIs.  In the HPC world, MPI hides all these different low-level APIs.  But there are real-world non-HPC apps out there that need extreme network performance, and therefore write their own unification layers for verbs, portals, MX, …etc.  Ick!

So why don’t we unify all these low-level native network APIs?

NOTE: This is quite similar to how we unified the high-level network APIs into MPI.

Two other key facts are important to realize here: Read More »

Tags: ,