Cisco Blog > High Performance Computing Networking
February 13, 2012 at 5:00 am PST
I made an offhand remark in my last entry about how MPI buffered sends are evil. In a comment on that entry, @brockpalen asked me why.
I gave a brief explanation in a comment reply, but the subject is enough to warrant its own blog entry.
So here it is — my top 10 reasons why MPI_BSEND (and its two variants) are evil:
- Buffered sends generally force an extra copy of the outgoing message (i.e., a copy from the application’s buffer to internal MPI storage). Note that I said “generally” — an MPI implementation doesn’t have to copy. But the MPI standard says “Thus, if a send is executed and no matching receive is posted, then MPI must buffer the outgoing message…” Ouch. Most implementations just always copy the message and then start processing the send. Read More »
Tags: HPC, mpi
February 11, 2012 at 4:00 am PST
Pop quiz, hotshot: how many types of sends are there in MPI?
Most people will immediately think of MPI_SEND. A few of you will remember the non-blocking variant, MPI_ISEND (where I = “immediate”).
But what about the rest — can you name them?
Here’s a hint: if I run “ls -1 *send*c | wc -l” in Open MPI’s MPI API source code directory, the result is 14. MPI_SEND and MPI_ISEND are two of those 14. Can you name the other 12?
Read More »
Tags: HPC, mpi
January 28, 2012 at 8:21 am PST
Back in the ’90s, there was a huge bubble of activity about Java in academic circles. It was the new language that was going to take over the world. An immense amount of research was produced mapping classic computer science issues into Java.
Among the projects produced were several that tried to bring MPI to Java. That is, they added a set of Java bindings over existing C-based MPI implementations. However, many in the HPC crowd eschewed Java for compute- or communication-heavy applications because of performance overheads inherent to the Java language and runtime implementations.
Hence, the Java+MPI=HPC efforts didn’t get too much traction.
But even though the computer science Java bubble eventually ended, Java has become quite an important language in the enterprise. Java run-time environments, compilers, and programming models have steadily improved over the years. Java is now commonly used for many different types of compute-heavy enterprise applications.
Read More »
Tags: HPC, java, mpi
January 24, 2012 at 6:45 am PST
A while ago, Brock Palen tweeted me an MPI question: how does one send Standard Template Library (STL) C++ objects in MPI?
The problem that Brock is asking about is that STL objects tend to be variable size and type. The whole point of the STL is to create flexible, easy-to-use “containers” of arbitrary types. For example, STL lists allow you to create an arbitrary length list of a given type.
To cite a concrete example, let’s say that my application has an STL vector object named my_vector that contains a bunch of integers. What parameters do I pass to MPI_SEND to send this beast?
Read More »
Tags: HPC, mpi
January 23, 2012 at 7:03 am PST
In the January MPI Forum meeting, several proposals passed their 2nd votes, meaning that they are “in” MPI-3. That being said, MPI-3 is not yet finalized (and won’t be for many more months), so changes can still happen.
- Creating MPI_COMM_SPLIT_TYPE
- Making the C++ bindings optional
- Updating RMA (a.k.a., “one-sided”)
- Creating a new “MPIT” tools interface
I’ll describe each of these briefly below.
Read More »
Tags: HPC, MPI-3.0
January 18, 2012 at 8:08 am PST
Welcome to 2012! I’m finally about caught up from the Christmas holidays, last week’s travel to the MPI Forum, etc. It’s time to finally get my blogging back on.
Let’s start with a short one…
Rich Brueckner from InsideHPC interviewed me right before the Christmas break about the low Ethernet MPI latency demo that I gave at SC’11. I blogged about this stuff before, but in the slidecast that Rich posted, I provide a bit more detail about how this technology works.
Remember that this is Cisco’s 1st generation virtualized NIC; our 2nd generation is coming “soon,” and will have significantly lower MPI latency (I hate being fuzzy and not quoting the exact numbers, but the product is not yet released, so I can’t comment on it yet. I’ll post the numbers when the product is actually available).
Tags: HPC, Linux, mpi, VFIO
December 30, 2011 at 10:00 am PST
MPI_Bcast("Hi, this is Jeff Squyres. I'm not in the office this week. "
"I'll see your message when I return in 2012. Happy New Year!",
1, MPI_MESSAGE, MPI_COMM_WORLD);
MPI_Bcast("Beep.", 1, MPI_MESSAGE, MPI_COMM_WORLD);
MPI_RECV(your_message, 1, MPI_MESSAGE, MPI_ANY_RANK, MPI_ANY_TAG,
MPI_COMM_WORLD, &status);
Tags: HPC, mpi
December 23, 2011 at 9:49 am PST
The upcoming January 2012 MPI Forum meeting is the last meeting to get new material into the MPI-3.0 specification.
Specifically, there are three steps to getting something into the MPI specification: a formal reading and two separate votes. Each of these three steps must happen at a separate meeting. This makes adding new material a long process… but that’s a good thing in terms of a standard. You want to be sure. You need a good amount of time of reflection and investigation before you standardize something for the next 10-20 years.
Of course, due to the deadline, we have a giant list of proposals up for a first reading in January (this is not including the 1st and 2nd votes also on the agenda). Here’s what’s on the docket so far — some are big, new things, while others are small clarifications to existing language: Read More »
Tags: HPC, mpi, MPI-3.0
December 16, 2011 at 8:18 am PST
After some further thought, I do believe that I was too quick to say that MPI is not a good fit for the embedded / RT space.
Yes, MPI is “large” (hundreds of functions with lots of bells and whistles). Yes, mainstream MPI is not primarily targeted towards RT environments.
But this does not mean that there have not been successful forays of MPI into this space. Two obvious ones jump to mind: Read More »
Tags: Embedded, HPC, mpi, RT
December 15, 2011 at 7:17 pm PST
My last blog post and MCAPI and MPI is worth some further explanation…
There were a number of good questions raised (both publicly in comments, and privately to me via email).
I ended up chatting with some MCAPI people from PolyCore Software: Sven Brehmer and Ted Gribb. We had a very interesting discussion which I won’t try to replicate here. Instead, we ended up recording an RCE-Cast today about MCAPI and MPI. It’ll be released in a few weeks (Brock already had one teed up to be released this weekend).
The main idea is that Sven and Ted were not trying to say that MCAPI is faster/better than MPI.
MCAPI is squarely aimed at a different market than MPI — the embedded market. Think: accelerators, DSPs, FPGAs, etc. And although MCAPI can be used for larger things (e.g., multiple x86-type servers on a network), there’s already well-established high-quality tools for that (e.g., MPI).
So perhaps it might be interesting to explore the realm of MPI + MCAPI in some fashion.
There’s a bunch of different forms that (MPI + MCAPI) could take — which one(s) would be useful? I cited a few forms in my prior blog post; we talked about a few more on the podcast.
But it’s hard to say without someone committing to doing some research, or a customer saying “I want this.” Talk is cheap — execution requires resources.
Would this be something that you, gentle reader, would be interested in? If so, let me know in the comments or drop me an email.
Tags: HPC, MCAPI, mpi, Multicore Association
December 9, 2011 at 11:15 am PST
From @softtalkblog, I was recently directed to an article about the Multicore Communication API (MCAPI) and MPI. Interesting stuff.
The main sentiments expressed in the article seem quite reasonable:
- MCAPI plays better in the embedded space than MPI (that’s what MCAPI was designed for, after all). Simply put: MPI is too feature-rich (read: big) for embedded environments, reflecting the different design goals of MCAPI vs. MPI.
- MCAPI + MPI might be a useful combination. The article cites a few examples of using MCAPI to wrap MPI messages. Indeed, I agree that MCAPI seems like it may be a useful transport in some environments.
One thing that puzzled me about the article, however, is that it states that MPI is terrible at moving messages around within a single server.
Huh. That’s news to me…
Read More »
Tags: HPC, MCAPI, mpi, Multicore Association
November 18, 2011 at 3:53 pm PST
As usual, I’m exhausted — in a good way — at the end of an SC week. Whew!
Thanks to all who came to see my demo (showing 5.17us NetPIPE MPI latency over Ethernet via Linux VFIO and Cisco’s “Palo” NIC — no, that’s not iWARP and it’s not IBoIP a.k.a. RoCE — see my prior post for a little more info), and thanks to all who came to the Open MPI BOF. I counted about 100 people at the BOF. The BOF slides are available, if you missed the actual event.
Brock and I did a [probably incredibly embarrassing] short video spot with Rich Brueckner at the end of the show (another in the RCE-Cast <--> InsightHPC crossover series). The convention announcer guy was literally saying “The show is over; please leave” over the PA while we were recording. Whenever Rich gets to posting the video, I think you’ll see why I usually stick to writing. :-)
Read More »
Tags: HPC, sc11
November 14, 2011 at 5:06 pm PST
Linux VFIO (Virtual Function IO) is an emerging technology that allows direct access to PCI devices from userspace. Although primarily designed as a hypervisor-bypass technology for virtualization uses, it can also be used in an HPC context.
Think of it this way: hypervisor bypass is somewhat similar to operating system (OS) bypass. And OS bypass is a characteristic sought in many HPC low-latency networks these days.
Drop by the Cisco SC’11 booth (#1317) where we’ll be showing a technology preview demo of Open MPI utilizing Linux VFIO over the Cisco “Palo” family of first-generation hardware virtualized NICs (specifically, the P81E PCI form factor). VIFO + hardware virtualized NICs allow benefits such as:
- Low HRT ping-pong latencies over Ethernet via direct access to L2 from userspace (4.88us)
- Hardware steerage of inbound and outbound traffic to individual MPI processes
Let’s dive into these technologies a bit and explain how they benefit MPI.
Read More »
Tags: HPC, Linux, sc11, VFIO
November 6, 2011 at 6:07 am PST
I’m sure most everyone has heard already, but the K supercomputer has been upgraded and now reaches over 10 petaflops. Wow!
10.51 petaflops, actually, so if you round up, you can say that they “turned it up to 11.” Ahem.
We’ll actually have Shinji Sumimoto from the K team speaking during the Open MPI BOF at SC’11. Rolf vandeVaart from NVIDIA will also be discussing their role in Open MPI during the BOF.
We have the 12:15-1:15pm timeslot on Wednesday (room TCC 303); come join us to hear about the present status and future plans for Open MPI.
Tags: HPC, Open MPI, Supercomputing
October 31, 2011 at 6:06 am PST
What a strange position I find myself in: the C++ bindings have become somewhat of a divisive issue in the MPI Forum. There are basically 3 groups in the Forum:
- Those who want to keep the C++ bindings deprecated. Meaning: do not delete them, but do not add any C++ bindings for new MPI-3 functions.
- Those who want to un-deprecate the C++ bindings. Meaning: add C++ bindings for all new MPI-3 functions.
- Those who want to delete the C++ bindings. Meaning: kill. Axe. Demolish. Remove. Never speak of them again.
Let me explain.
Read More »
Tags: HPC, mpi