Today’s post is a quickie / roundup of things right before the maelstrom of Supercomputing starts in force tomorrow night… Read More »
Doug Eadline recently talked about how community is tremendously important to HPC. Two words: he’s right. The HPC ecosystem is all about working together to advance the state of the art. No single group, university, or company could do it alone.
As Cisco’s representative to the MPI Forum and the Open MPI software projects, I often work with teams of researchers and developers. Sometimes all the people are in one physical place and the process of sharing ideas and dividing work is easy. But it’s much more common for me to participate in geographically scattered groups of people. And there’s no doubt about it: collaboration across distances is just hard. You just can’t beat having a bunch of engineers in the same room with a whiteboard when trying to figure out a complex topic. But we don’t always get that opportunity.
So how do you take a disparate group of people and make them productive?
Lotsa news coming out in the ramp-up to SC. Probably the biggest is that about China being the proud owners of the 2.5-petaflop computing monster named “Tianhe-1A”.
Congratulations to all involved! 2.5 petaflops is an enormous achievement.
Just to put this in perspective, there are only three other (publicly disclosed) machines in the world right now that have reached a petaflop: the Oak Ridge US Department of Energy (DoE) “Jaguar” machine hit 1.7 petaflops, China’s “Nebulae” hit 1.3 petaflops, and the Los Alamos US DoE “Roadrunner” machine hit 1.0 petaflops.
While petaflop-and-beyond may stay firmly in the bleeding-edge research domain for quite some time, I’m sure we’ll see more machines of this class over the next few years. Read More »
Open question to MPI developers: are you using the features added in MPI-2.2?
I ask because I took a little heat in the last MPI Forum meeting for not driving Open MPI to be MPI-2.2 compliant (Open MPI is MPI-2.1 compliant; there’s 4 open tickets that need to be completed for full MPI-2.2 compliance).
But I’m having a hard time finding users who want or need these specific functionalities (admittedly, they’re somewhat obscure). We’ll definitely get to these items someday — the question is whether that someday needs to be soon or whether it can be a while from now.
Core counts are going up. Cisco’s C460 rack-mount server series, for example, can have up to 32 Nehalem EX cores. As a direct result, we may well be returning to the era of running more than one MPI process per server. This has long been true in “big iron” parallel resources, but commodity Linux HPC clusters have tended towards the one-MPI-job-per-server model in recent history.
Because of this trend, I have an open-ended question for MPI users and cluster administrators: how do you want to bind MPI processes to processors? For example: what kinds of binding patterns do you want? How many hyperthreads / cores / sockets do you want each process to bind to? How do you want to specify what process binds where? What level of granularity of control do you want / need? (…and so on)
We are finding that every user we ask seems to have slightly different answers. What do you think? Let me know in the comments, below.