MPI run-time at large scale
With the news that Open MPI is being used on the K supercomputer (i.e., the #1 machine on the June 2011 Top500 list), another colleague of mine, Ralph Castain — who focuses on the run-time system in Open MPI — pointed out that K has over 80,000 processors (over 640K cores!). That’s ginormous.
He was musing to me that it would be fascinating to see some of K’s run-time data for what most people don’t consider too interesting / sexy: MPI job launch performance.
For example, another public use of Open MPI is on Los Alamos National Lab’s RoadRunner, which has 3,000+ nodes at 4 processes per node (remember RoadRunner? It was #1 for a while, too).
It’s worth noting that Open MPI starts up full-scale jobs on RoadRunner — meaning that all processes complete MPI_INIT — in less than 1 minute.
It took a lot of work to get Open MPI to not only scale to run jobs that large, but also to start with such speed. I remember running Open MPI on Thunderbird (which reached #6 on the Top500 list) where it took 10-20 minutes to launch at full scale across ~4K computer nodes. Painful.
It’s worth noting that various national labs are looking at making even bigger machines.
Work continues to make Open MPI’s run-time system and MPI layers continue to scale out. Engineers and scientists from organizations such as Cisco, Oak Ridge National Laboratory, and the University of Tennessee at Knoxville (and others) are aggressively pursuing the issues surrounding scaling to arbitrarily large numbers of MPI processes, such as resilience and fault tolerance.
…ok, I’ll get off my bragging-on-Open-MPI soap box now. Back to MPI tidbits and news!