Cisco Blogs


Cisco Blog > High Performance Computing Networking

MPI Programming Mistakes

February 25, 2011 at 7:30 am PST

Mistakes

I’ve seen many users make lots of different kinds of MPI programming mistakes.

Some are common, newbie types of mistakes.  Others are common intermediate-level mistakes.  Others are incredibly subtle programming mistakes in deep logic that took sophisticated debugging tools to figure out (race conditions, memory overflowing, etc.).

In 2007, I wrote a pair of magazine columns listing 10 common MPI programming mistakes (see this PDF for part 1 and this PDF for part 2).  Indeed, we still see users asking about some of these mistakes on the Open MPI user’s mailing list.

What mistakes do you see your users making with MPI?  How can we — the MPI community — better educate users to avoid these kinds of common mistakes?  Post your thoughts in the comments.

Read More »

Tags: , ,

MPI Forum Roundup

February 11, 2011 at 9:00 am PST

We just finished up another MPI Forum meeting earlier this week, hosted at the Cisco node 0 facility in San Jose, CA.  A lot of the working groups are making tangible progress and bringing their work back to the full forum for review and discussion.  Sometimes the working group reports are accepted and moved forward towards standardization; other times, the full Forum provides feedback and guidance, and then sends the working group back to committee to keep hashing out details.  This is pretty typical stuff for a standard body.

This week, we had a first vote (out of two total) on the MPI_MPROBE proposal.  It passed the vote, and will likely pass its next vote in March, meaning that it will become part of the MPI 3.0 draft standard.

MPI_MPROBE closes an important race condition vulnerability.

Read More »

Tags: ,

Is your MPI IPv6-ready?

February 4, 2011 at 9:30 am PST

Here’s a poll for readers: is your MPI IPv6-ready?

Many of you may not be using IP-based MPI network transports, but as HPC is becoming more and more commoditized, IP-based MPI implementations may actually start gaining in importance.  Not to ultra-high-performing systems, of course.  But you’d be surprised how many 4-, 8-, and 16-node Ethernet-based clusters are sold these days… particularly as core counts are increasing — a 16-node Westmere cluster is quite powerful!

Owners of such systems are typically running ISV-based MPI applications, or other “canned” parallel software.  Most of them don’t use InfiniBand or other high-speed interconnect — they just use good old Ethernet with TCP as the underlying transport for their MPI.

Read More »

Tags: ,

Exascale: it’s not just the (networking) hardware

January 24, 2011 at 7:45 am PST

Many in the HPC research community are starting to work on “exascale” these days — the ability to do 10^18 floating point operations per second.  Exascale is such a difficult problem that it will require new technologies in many different areas before it can become a reality.  Case in point is this entry at Inside HPC today entitled, “InfiniBand Charts Course to Exascale“.

It cites The Exascale Report and a blog entry by Lloyd Dickman at the IBTA about their course going forward.  It’s a good read — Lloyd’s a smart, thoughtful guy.

That being said, there’s a key piece missing from the discussion: the (networking) software.  More specifically: the current OpenFabrics Verbs API abstractions are (probably) unsuitable for exascale, a fact that Fab Tillier (Microsoft) and I presented at the OpenFabrics workshop in Sonoma last year (1up, 2up).

Read More »

Tags: , , , , ,

Community-contributed Perl and Python bindings for hwloc

January 22, 2011 at 7:30 am PST

I love open source communities.

Two hwloc community members have taken it upon themselves to provide high-quality native language bindings for Perl and Python.  There’s active work going on, and discussions occurring between the hwloc core developers and these language providers in order to provide good abstractions, functionality, and experience.

  • The Perl CPAN module is being developed by Bernd Kallies: you can download it here (I linked to the directory rather than a specific tarball because he keeps putting up new versions).
  • The Python bindings are being developed by Guy Streeter (at Red Hat); his git repository is available here.

Read More »

Tags: ,