In prior blog posts, I’ve talked about the implications of registered memory for both MPI applications and implementations.
Here’s another fun implication that was discovered within the last few months by Nathan Hjelm and Samuel Gutierrez out at Los Alamos National Labs: registered memory imbalances.
As an interesting side note: as far as we can tell, no other MPI implementation attempts to either balance registered memory between MPI processes, or handle the performance implications that occur with grossly imbalanced registered memory consumption.
Let’s review a few key points before defining what registered memory imbalances are.
Read More »
Tags: HPC, mpi, NUMA, RDMA
A friend approached me the other day asking what this Open MPI error message meant:
Memory <address> cannot be freed from the registration cache. Possible memory corruption.
Open MPI was displaying this message late in the application’s run — it was a pretty safe bet that when the message was printed, it was well after the actual error. “Memory corruption” is a word that sends shivers down developers’ spines.
The message itself is, unfortunately, not very helpful. It turns out that this message is from Open MPI 1.4, which is now the prior production series of Open MPI. We’ve recently deprecated it by releasing Open MPI v1.6. And in v1.6, we replaced this error message with something much more helpful.
Read More »
Tags: HPC, mpi
Long time Open MPI mailing list contributor and Open Source Grid Engine (OGE, previously known as SGE) maintainer Rayson Ho has just opened up a blog about Open Grid Engine kinds of things.
In one of his first posts, he’s giving away a free Cisco Live! pass for the June 10-14, 2012 event. All you have to do is answer a “simple” MPI question (well, it might not be as simple as it looks ).
As of yesterday, no one had answered the question correctly, so it’s still up for grabs!
Tags: cisco live, HPC, mpi
Last March’s MPI Forum meeting was the last meeting to get a “formal reading” of proposals into MPI-3. Some were quite controversial. Some ended up being withdrawn before the next meeting.
This week’s Forum meeting in Japan saw the first vote (out of two) for each the surviving proposals from the March meeting (see the full voting results here). Some continued to be quite controversial. Some didn’t survive their first votes (doh!). Others narrowly survived.
Here’s a summary of some of the users-will-care-about-these proposals, and how they fared: Read More »
Tags: HPC, mpi, MPI-3
Marking the end of over 2 years of active development, the Open MPI project has released a new “stable” series of releases starting with v1.6.
Specifically, Open MPI maintains two concurrent release series:
- Odd number releases are “feature development” releases (e.g., 1.5.x). They’re considered to be stable and test, but not yet necessarily “mature” (i.e., have lots of real-world usage to shake out bugs). New features are added over the life of feature development releases.
- Even number releases are “super stable” releases (e.g., 1.6.x). After enough time, feature development releases transition into super stable releases — the new functionality has been vetted by enough real world usage to be considered stable enough for production sites.
Conceptually, it looks like this:
Tags: HPC, mpi