EuroMPI 2012 -- the prime meeting for researchers, developers, and students in message-passing parallel computing with MPI (and related paradigms) -- calls for active participation in the conference which will take place from September 23rd to September 26th in Vienna, Austria, at the Austrian Academy of Sciences.
In my prior blog entry, I answered the first of Durga C.’s questions to me. Here’s all three of his questions:
- What is the role of the hardware in an RDMA transaction? In other words, why does one need special hardware (e.g., InfiniBand, iWARP, RoCE, etc.) hardware to do RDMA as opposed to a “normal” Ethernet NIC? (see prior blog entry)
- Further, can you explain why pure software solutions (e.g., Open-MX) are better than nothing when you don’t have hardware support?
- Also, what is the difference between “RDMA” and “RMA”?
Let’s explore the last two of those questions. Read More »
Durga C., long-time listener, first-time caller, sent me a few interesting questions that I thought I’d share with everyone. Here’s his first question:
- What is the role of the hardware in an RDMA transaction? In other words, why does one need special hardware (e.g., InfiniBand, iWARP, RoCE, etc.) hardware to do RDMA as opposed to a “normal” Ethernet NIC?
This one question is surprisingly complex. Let’s dive in…
In prior blog posts, I’ve talked about the implications of registered memory for both MPI applications and implementations.
As an interesting side note: as far as we can tell, no other MPI implementation attempts to either balance registered memory between MPI processes, or handle the performance implications that occur with grossly imbalanced registered memory consumption.
Let’s review a few key points before defining what registered memory imbalances are.
A friend approached me the other day asking what this Open MPI error message meant:
Memory <address> cannot be freed from the registration cache. Possible memory corruption.
Open MPI was displaying this message late in the application’s run — it was a pretty safe bet that when the message was printed, it was well after the actual error. “Memory corruption” is a word that sends shivers down developers’ spines.
The message itself is, unfortunately, not very helpful. It turns out that this message is from Open MPI 1.4, which is now the prior production series of Open MPI. We’ve recently deprecated it by releasing Open MPI v1.6. And in v1.6, we replaced this error message with something much more helpful.