Cisco Blogs


Cisco Blog > High Performance Computing Networking

Do you read MPI error messages?

June 16, 2012 at 11:53 am PST

A friend approached me the other day asking what this Open MPI error message meant:

Memory <address> cannot be freed from the registration cache. Possible memory corruption.

Open MPI was displaying this message late in the application’s run — it was a pretty safe bet that when the message was printed, it was well after the actual error. “Memory corruption” is a word that sends shivers down developers’ spines.

The message itself is, unfortunately, not very helpful.  It turns out that this message is from Open MPI 1.4, which is now the prior production series of Open MPI.  We’ve recently deprecated it by releasing Open MPI v1.6.  And in v1.6, we replaced this error message with something much more helpful.

Read More »

Tags: ,

New SGE blog / Cisco Live! ticket contest

June 5, 2012 at 5:29 am PST

Long time Open MPI mailing list contributor and Open Source Grid Engine (OGE, previously known as SGE) maintainer Rayson Ho has just opened up a blog about Open Grid Engine kinds of things.

In one of his first posts, he’s giving away a free Cisco Live! pass for the June 10-14, 2012 event.  All you have to do is answer a “simple” MPI question (well, it might not be as simple as it looks :-) ).

As of yesterday, no one had answered the question correctly, so it’s still up for grabs!

Tags: , ,

MPI-3 voting: results

May 31, 2012 at 5:00 am PST

Last March’s MPI Forum meeting was the last meeting to get a “formal reading” of proposals into MPI-3. Some were quite controversial. Some ended up being withdrawn before the next meeting.

This week’s Forum meeting in Japan saw the first vote (out of two) for each the surviving proposals from the March meeting (see the full voting results here). Some continued to be quite controversial. Some didn’t survive their first votes (doh!). Others narrowly survived.

Here’s a summary of some of the users-will-care-about-these proposals, and how they fared: Read More »

Tags: , ,

Followup to “The Common Communications Interface (CCI)”

May 30, 2012 at 4:11 pm PST

A few people have made remarks to me about the pair of CCI guest blog entries from Scott Atchley of Oak Ridge (entry 1, entry 2) indicating that they didn’t quite “get it”.  So let me try to put Scott’s words in concrete terms…

CCI is an API that represents a unification of low-level “native” network APIs.  Specifically: many network vendors are doing essentially the same things in their low-level “native” network APIs.  In the HPC world, MPI hides all these different low-level APIs.  But there are real-world non-HPC apps out there that need extreme network performance, and therefore write their own unification layers for verbs, portals, MX, …etc.  Ick!

So why don’t we unify all these low-level native network APIs?

NOTE: This is quite similar to how we unified the high-level network APIs into MPI.

Two other key facts are important to realize here: Read More »

Tags: ,

Open MPI v1.6 released

May 14, 2012 at 9:29 am PST

Marking the end of over 2 years of active development, the Open MPI project has released a new “stable” series of releases starting with v1.6.

Specifically, Open MPI maintains two concurrent release series:

  • Odd number releases are “feature development” releases (e.g., 1.5.x).  They’re considered to be stable and test, but not yet necessarily “mature” (i.e., have lots of real-world usage to shake out bugs).  New features are added over the life of feature development releases.
  • Even number releases are “super stable” releases (e.g., 1.6.x).  After enough time, feature development releases transition into super stable releases — the new functionality has been vetted by enough real world usage to be considered stable enough for production sites.

Conceptually, it looks like this:

Read More »

Tags: ,