In the world of high-performance computing, we jump through hoops to extract the last bit of performance from our machines. The vast majority of processes use the Message Passing Interface (MPI) to handle communication. Each MPI implementation abstracts the underlying network away, depending on the available interconnect(s). Ideally, the interconnect offers some form of operating system (OS) bypass and remote memory access in order to provide the lowest possible latency and highest possible throughput. If not, MPI typically falls back to TCP sockets. The MPI’s network abstraction layer (NAL) then optimizes the MPI communication pattern to match that of the interconnect’s API. For similar reasons, most distributed, parallel filesystems such as Lustre, PVFS2, and GPFS, also rely on a NAL to maximize performance. Read More »
Just in case you didn’t see my tweet: my group is hiring!
We need some Linux kernel hackers for some high-performance networking stuff. This includes MPI and other verticals.
I believe that the official job description is still working its way through channels before it appears on the official external Cisco job-posting site, but the gist of it is Linux kernel work for Cisco x86 servers (blades and rack-mount) and NICs in high performance networking scenarios.
Are you interested? If so, send me an email with your resume — I’m jsquyres at cisco dot com.
Here’s a not-uncommon question that we get on the Open MPI mailing list:
Why do MPI processes consume 100% of the CPU when they’re just waiting for incoming messages?
The answer is rather straightforward: because each MPI process polls aggressively for incoming messages (as opposed to blocking and letting the OS wake it up when a new message arrives). Most MPI implementations do this by default, actually.
The reasons why they do this is a little more complicated, but loosely speaking, one reason is that polling helps get the lowest latency possible for short messages.
Jeff is lazy this week, so he asked that I provide some notes on the process binding options available in the Open MPI (OMPI) v1.5 release series.
First, though, a caveat. The binding options in the v1.5 series are pretty much the same as in the prior v1.4 series. However, future releases (beginning with the v1.7 series) will have significantly different options providing a broader array of controls. I won’t address those here, but will do so in a later post.
Today we feature a deep-dive guest post from Torsten Hoefler, the Performance Modeling and Simulation lead of the Blue Waters project at NCSA, and Pavan Balaji, computer scientist in the Mathematics and Computer Science (MCS) Division at the Argonne National Laboratory (ANL), and as a fellow of the Computation Institute at the University of Chicago.
Despite MPI’s vast success in bringing portable message passing to scientists on a wide variety of platforms, MPI has been labeled as a communication model that only supports “two-sided” and “global” communication. The MPI-1 standard, which was released in 1994, provided functionality for performing two-sided and group or collective communication. The MPI-2 standard, released in 1997, added support for one-sided communication or remote memory access (RMA) capabilities, among other things. However, users have been slow to adopt such capabilities because of a number of reasons, the primary ones being: (1) the model was too strict for several application behavior patterns, and (2) there were several missing features in the MPI-2 RMA standard. Bonachea and Duell put together a more-or-less comprehensive list of areas where MPI-2 RMA falls behind. A number of alternate programming models, including Global Arrays, UPC and CAF have gained popularity filling this gap.
That’s where MPI-3 comes in.