Cisco Blog > High Performance Computing Networking

SC’09 Happenings

October 14, 2009 at 12:00 pm PST

Who’s going to SC’09?  I’ll be there!

I’m hosting the Open MPI Community Meeting BOF with George Bosilca from the University of Tennessee, Knoxville.  Be sure to come by to hear about where we are and where we’re going in the Open MPI project.  There’s also an MPI[-3] Forum BOF for anyone who wants to get a glimpse of where we’re going on the standards committee.  I highly recommend attending for anyone who works with MPI.

Additionally, I’ll be hanging out in the Cisco Booth (#1847); stop by and say hello!

(Editor’s note: fixed the link to the Cisco booth — thanks to Edric and others who pointed out that it was wrong!)

Read More »

GPU: HPC Friend or Foe?

October 8, 2009 at 12:00 pm PST

General purpose computing with GPUs looks like a great concept on paper.  Indeed, SC’08 was dominated by GPUs — it was impossible not to be (technically) impressed with some of the results that were being cited and shown on the exhibit floor.  But despite that, GPGPUs have failed to become a “must have” HPC technology over the past year.  Last week’s announcements from NVIDIA look really great for the HPC crowd (aside from some embarrissing PR blunders) — they seem to address many of the shortcomings of prior generation GPU usage in an HPC environment: more memory, more cores, ECC memory, better / cheaper memory management, etc.  Will GPUs become the new hotness in HPC?

The obvious question here is “Why is Jeff discussing GPUs on an MPI blog?”

Read More »

Attaining High Performance Communications: A Vertical Approach

September 30, 2009 at 12:00 pm PST

It’s finally been published! 

I wrote a chapter on MPI in the book Attaining High Performance Communications: A Vertical Approach, edited by Dr. Ada Gavrilovska from the Georgia Institute of Technology.

 

Book picture: Attaining High Performance Communications: A Vertical Approach

The chapter author list reads like a who’s-who in high performance computing: several of my colleagues from the MPI Forum wrote pieces of this book, as well as many bright graduate students and other noted dignitaries in HPC.

Read More »

What is MPI?

September 25, 2009 at 12:00 pm PST

As I think most readers of this blog already know, when I say “MPI”, I mean “Message Passing Interface.”

I saw an confusing-and-amusing blog entry today over at insideHPC (and HPCwire): GigaSpaces and MPI Europe partner on financial messaging overseas

“MPI Europe?”, I thought.  “What’s that?  Is that some MPI-based ISV that I’ve never heard of?”

Read More »

Lies, damn lies, and statistics

September 15, 2009 at 12:00 pm PST

I’m a fan of InsideHPC; I read it every day.  I like John’s commentary; he does a great job of rounding up various newsworthy HPC-related articles.  But that doesn’t always mean that I agree with every posted item.  Case in point: I saw this article the other day, purportedly a primer on InfiniBand (referring to this HPCprojects article).  I actually know a bit about IB; I used to work in the IB group at Cisco.  Indeed, I’ve written a lot of OpenFabrics verbs-based code for MPI implementations.

There’s good information in that article, but also some fantastically unfounded and misleading marketing quotes:

  •  ”With large data transfers, Ethernet consumes as much as 50 per cent of the CPU cycles; the average for InfiniBand is a loss of less than 10 to 20 per cent.”  He’s referring to software TCP overhead, not Ethernet overhead.  There’s an enormous difference — there’s plenty of Ethernet-based technologies that are in the 10-20% overhead range.
  • “There are also power savings to be had, and this is critical when HPC facilities are confronting major issues with power supplies, cooling and costs. The same study indicates that InfiniBand cuts power costs considerably to finish the same number of Fluent jobs compared to Gigabit Ethernet; as cluster size increases, more power can be saved.”  Wow.  Other than generating warm fuzzies for customers (“My network products are green!”), what exactly does that paragraph mean?  And how exactly was it quantified?
  • …I’ll stop with just those 2.  :-)

These quotes are classic marketing spin to make IB products look the better than the competition.

Read More »

Announcing hwloc: portable hardware locality open source software

September 13, 2009 at 12:00 pm PST

(this blog entry co-written by Brice Goglin and Samuel Thibault from the INRIA Runtime Team)

We’re pleased to announce a new open source software project: Hardware Locality (or “hwloc“, for short).  The hwloc software discovers and maps the NUMA nodes, shared caches, and processor sockets, cores, and threads of Linux/Unix and Windows servers.  The resulting topological information can be displayed graphically or conveyed programatically though a C language API.  Applications (and middleware) that use this information can optimize their performance in a variety of ways, including tuning computational cores to fit cache sizes and utilizing data locality-aware algorithms.

hwloc actually represents the merger of two prior open source software projects:

  • libtopology, a package for discovering and reporting the internal processor and cache topology in Unix and Windows servers.
  • Portable Linux Processor Affinity (PLPA), a package for solving Linux topological processor binding compatibility issues

Read More »

MPI 2.2: done!

September 4, 2009 at 12:00 pm PST

From the home office in Helsinki, Finland: MPI-2.2 is done!  It’s done it’s done it’s done!

Finally!  The MPI-2.2 document has been voted in by the MPI Forum.  The official PDF document will be published on www.mpi-forum.org soon.  HLRS is selling (at cost) MPI-2.2 books; contact Rolf Rabenseifner if you’re interested (I’ll be getting one!).

Read More »

Non Uniform Network Access (NUNA)

August 27, 2009 at 12:00 pm PST

Everything old is new again — NUMA is back! With NUMA going mainstream, high performance software — MPI applications and otherwise — might need to be re-tuned to maintain their current performance levels.A less-acknowledged aspect of HPC systems is the multiple levels of networks that are traversed to get data from MPI process A to MPI process B. The heterogeneous, multi-level network is going to become more important (again) in your applications’ overall performance, especially as per-compute-server-core-counts increase. That is, it’s not going to only be about the bandwidth and latency of your “Ethermyriband” network. It’s also going to be about the network (or networks!) inside each compute server.A Cisco colleague of mine (hi Ted!) previously coined a term that is quite apropos for what HPC applications now need to target: it’s no longer just about NUMA — NUMA effects are only one of the networks involved. Think bigger: the issue is really about Non-Uniform Network Access (NUNA). Read More »

Platform Acquires HP-MPI

August 24, 2009 at 12:00 pm PST

In a move that will surely cause some head-scratching, Platform has acquired the intellectual property of the-MPI-previously-known-as-HP-MPI.The head scratching part is that Platform already owns Scali MPI. It’s no secret that they recently moved all Scali development to an engineering team based in China. Read More »

Better Linux memory tracking

August 21, 2009 at 12:00 pm PST

Yesterday morning, we (Open MPI) entered what is hopefully a final phase of testing for a “better” implementation of the “leave registered” optimization for OpenFabrics networks. I briefly mentioned this work in a prior blog entry; it’s now finally coming to fruition. Woo hoo!Roland Dreier has pushed a new Linux kernel module upstream for helping user-level applications track when memory leaves their process (it’s not guaranteed that this kernel module will be accepted, but it looks good so far). This kernel module allows MPI implementations, for example, to be alerted when registered memory is freed — a critical operation for certain optimizations and proper under-the-covers resource management.What does this mean to the average MPI application user? It means that future versions of Open MPI (and other MPI implementations) will finally have a solid, bulletproof way to implement the “leave registered” optimization for large message passing. Prior versions of this optimization required nasty, ugly, dirty Linux hacks that sometimes broke real-world applications. Boooo! The new way will not break any applications because it gets help from the underlying operating system (rather than trying to go around or hijack certain operating system functions). Yay! Read More »

SEND, ISEND, or SENDRECV…?

August 16, 2009 at 12:00 pm PST

I find that there are generally two types of MPI application programmers:

  1. Those that only use standard (“blocking”) mode sends and receives
  2. Those that use non-blocking sends and receives

The topic of whether an MPI application should use only simple standard mode sends and receives or dive into the somewhat-more-complex non-blocking modes of communication comes up not-infrequently (it just came up again on the Open MPI user’s mailing list the other day). It’s always a challenge for programmers who are new to MPI to figure out which model they should use. Recently, we came across a user who chose a third solution: use MPI_SENDRECV. Read More »

Benchmarking: the good, the bad, and the ugly

August 10, 2009 at 12:00 pm PST

Here’s a great quote that I ran across the other day from an article entitled A short history of btrfs on lwn.net by Valerie Aurora. Valerie was specifically talking about benchmarking filesystems, but you could replace the words “file systems” with just about any technology:

When it comes to file systems, it’s hard to tell truth from rumor from vile slander: the code is so complex, the personalities are so exaggerated, and the users are so angry when they lose their data. You can’t even settle things with a battle of the benchmarks: file system workloads vary so wildly that you can make a plausible argument for why any benchmark is either totally irrelevant or crucially important.

This remark is definitely true in high performance computing realms, too. Let me use it to give a little insight into MPI implementer behavior, with a specific case study from the Open MPI project. Read More »

MPI-2.2 is darn near done

August 3, 2009 at 12:00 pm PST

Torsten beat me to the punch last week (and insideHPC commented on it), but I’m still going write my $0.02 about the MPI-2.2 spec anyway.At last week’s MPI Forum meeting in Chicago (hosted at the beautiful Microsoft facility — gotta love those fruit+granola yogurt parfaits they serve!), we had the last round of 2nd votes on the MPI-2.2 specification. All changes and updates to MPI-2.1 are therefore closed. Woo hoo! All that remains is for us to actually integrate all the text that was voted on into a single, cohesive document, and then have a round of final votes at the next Forum meeting in Helsinki, Finland. These last votes in Helsinki are at least somewhat of a formality, but they do ensure that we don’t make editing mistakes in the process of transcribing all the proposals that passed into what will become the official MPI-2.2 standard document. A few MPI-2.2 proposals didn’t get resolved in time to make it into the final MPI-2.2 document (and we found at least one or two errors in the proposals that did pass into MPI-2.2), so we’ll be issuing a short MPI-2.2 errata document shortly after MPI-2.2 is published. Read More »

Welcome to the Bcast Blog!

July 30, 2009 at 12:00 pm PST

Greetings and welcome to the MPI “Bcast” blog!My name is Jeff Squyres, I’ll be your host. I’m Cisco’s representative to the Message Passing Interface (MPI) Forum, and I’m one of the core developers of the open source Open MPI project — an implementation of that MPI standard. Essentially, my job at Cisco is to be “The MPI Guy” — I deal with all things MPI. Read More »