Cisco Blogs


Cisco Blog > High Performance Computing Networking

Hardware vs. software: user questions

June 26, 2012 at 1:20 pm PST

Durga C., long-time listener, first-time caller, sent me a few interesting questions that I thought I’d share with everyone.  Here’s his first question:

  1. What is the role of the hardware in an RDMA transaction?  In other words, why does one need special hardware (e.g., InfiniBand, iWARP, RoCE, etc.) hardware to do RDMA as opposed to a “normal” Ethernet NIC?

This one question is surprisingly complex.  Let’s dive in…

Read More »

Tags: , , , , ,

Registered memory imbalances

June 23, 2012 at 4:36 am PST

In prior blog posts, I’ve talked about the implications of registered memory for both MPI applications and implementations.

Here’s another fun implication that was discovered within the last few months by Nathan Hjelm and Samuel Gutierrez out at Los Alamos National Labs: registered memory imbalances.

As an interesting side note: as far as we can tell, no other MPI implementation attempts to either balance registered memory between MPI processes, or handle the performance implications that occur with grossly imbalanced registered memory consumption.

Let’s review a few key points before defining what registered memory imbalances are.

Read More »

Tags: , , ,

Do you read MPI error messages?

June 16, 2012 at 11:53 am PST

A friend approached me the other day asking what this Open MPI error message meant:

Memory <address> cannot be freed from the registration cache. Possible memory corruption.

Open MPI was displaying this message late in the application’s run — it was a pretty safe bet that when the message was printed, it was well after the actual error. “Memory corruption” is a word that sends shivers down developers’ spines.

The message itself is, unfortunately, not very helpful.  It turns out that this message is from Open MPI 1.4, which is now the prior production series of Open MPI.  We’ve recently deprecated it by releasing Open MPI v1.6.  And in v1.6, we replaced this error message with something much more helpful.

Read More »

Tags: ,

New SGE blog / Cisco Live! ticket contest

June 5, 2012 at 5:29 am PST

Long time Open MPI mailing list contributor and Open Source Grid Engine (OGE, previously known as SGE) maintainer Rayson Ho has just opened up a blog about Open Grid Engine kinds of things.

In one of his first posts, he’s giving away a free Cisco Live! pass for the June 10-14, 2012 event.  All you have to do is answer a “simple” MPI question (well, it might not be as simple as it looks :-) ).

As of yesterday, no one had answered the question correctly, so it’s still up for grabs!

Tags: , ,

MPI-3 voting: results

May 31, 2012 at 5:00 am PST

Last March’s MPI Forum meeting was the last meeting to get a “formal reading” of proposals into MPI-3. Some were quite controversial. Some ended up being withdrawn before the next meeting.

This week’s Forum meeting in Japan saw the first vote (out of two) for each the surviving proposals from the March meeting (see the full voting results here). Some continued to be quite controversial. Some didn’t survive their first votes (doh!). Others narrowly survived.

Here’s a summary of some of the users-will-care-about-these proposals, and how they fared: Read More »

Tags: , ,