Lotsa news coming out in the ramp-up to SC. Probably the biggest is that about China being the proud owners of the 2.5-petaflop computing monster named “Tianhe-1A”.
Congratulations to all involved! 2.5 petaflops is an enormous achievement.
Just to put this in perspective, there are only three other (publicly disclosed) machines in the world right now that have reached a petaflop: the Oak Ridge US Department of Energy (DoE) “Jaguar” machine hit 1.7 petaflops, China’s “Nebulae” hit 1.3 petaflops, and the Los Alamos US DoE “Roadrunner” machine hit 1.0 petaflops.
While petaflop-and-beyond may stay firmly in the bleeding-edge research domain for quite some time, I’m sure we’ll see more machines of this class over the next few years. Read More »
Tags: HPC, mpi, petaflop, Supercomputing
Core counts are going up. Cisco’s C460 rack-mount server series, for example, can have up to 32 Nehalem EX cores. As a direct result, we may well be returning to the era of running more than one MPI process per server. This has long been true in “big iron” parallel resources, but commodity Linux HPC clusters have tended towards the one-MPI-job-per-server model in recent history.
Because of this trend, I have an open-ended question for MPI users and cluster administrators: how do you want to bind MPI processes to processors? For example: what kinds of binding patterns do you want? How many hyperthreads / cores / sockets do you want each process to bind to? How do you want to specify what process binds where? What level of granularity of control do you want / need? (…and so on)
We are finding that every user we ask seems to have slightly different answers. What do you think? Let me know in the comments, below.
Read More »
Tags: HPC, mpi, NUMA, process affinity
Have you ever wondered how an MPI implementation picks network paths and allocates resources? It’s a pretty complicated (set of) issue(s), actually.
An MPI implementation must tread the fine line between performance and resource consumption. If the implementation chooses poorly, it risks poor performance and/or the wrath of the user. If the implementation chooses well, users won’t notice at all — they silently enjoy good performance.
It’s a thankless job, but someone’s got to do it.
Read More »
Tags: HPC, mpi, RDMA
At long last, we have released a stable, production-quality version of Hardware Locality (hwloc). Yay!
If you’ve missed all my prior discussions about hwloc, hwloc provides command line tools and a C API to obtain the hierarchical map of key computing elements, such as: NUMA memory nodes, shared caches, processor sockets, processor cores, and processing units (logical processors or “threads”). hwloc also gathers various attributes such as cache and memory information, and is portable across a variety of different operating systems and platforms.
In an increasing NUMA (and NUNA!) world, hwloc is a valuable tool for high performance.
Read More »
Tags: HPC, hwloc, NUMA, NUNA, process affinity
If ever I doubted that MPI was good for the world, I think that all I would need to do is remind myself of this commit that I made into the Open MPI source code repository today. It was a single-character change — changing a 0 to a 1. But the commit log message was Tolstoyian in length:
- 87 lines of text
- 736 words
- 4225 characters
Go ahead — read the commit message. I double-dog dare you.
That tome of a commit message both represents several months of on-and-off work on a single bug, and details the hard-won knowledge that was required to understand why changing a 0 to a 1 fixed a bug.
Read More »
Tags: HPC, mpi, why-mpi-is-good-for-you