(this blog entry co-written by Brice Goglin and Samuel Thibault from the INRIA Runtime Team)
We’re pleased to announce a new open source software project: Hardware Locality (or “hwloc“, for short). The hwloc software discovers and maps the NUMA nodes, shared caches, and processor sockets, cores, and threads of Linux/Unix and Windows servers. The resulting topological information can be displayed graphically or conveyed programatically though a C language API. Applications (and middleware) that use this information can optimize their performance in a variety of ways, including tuning computational cores to fit cache sizes and utilizing data locality-aware algorithms.
hwloc actually represents the merger of two prior open source software projects:
- libtopology, a package for discovering and reporting the internal processor and cache topology in Unix and Windows servers.
- Portable Linux Processor Affinity (PLPA), a package for solving Linux topological processor binding compatibility issues
Read More »
From the home office in Helsinki, Finland: MPI-2.2 is done! It’s done it’s done it’s done!
Finally! The MPI-2.2 document has been voted in by the MPI Forum. The official PDF document will be published on www.mpi-forum.org soon. HLRS is selling (at cost) MPI-2.2 books; contact Rolf Rabenseifner if you’re interested (I’ll be getting one!).
Read More »
Everything old is new again — NUMA is back! With NUMA going mainstream, high performance software — MPI applications and otherwise — might need to be re-tuned to maintain their current performance levels.A less-acknowledged aspect of HPC systems is the multiple levels of networks that are traversed to get data from MPI process A to MPI process B. The heterogeneous, multi-level network is going to become more important (again) in your applications’ overall performance, especially as per-compute-server-core-counts increase. That is, it’s not going to only be about the bandwidth and latency of your “Ethermyriband” network. It’s also going to be about the network (or networks!) inside each compute server.A Cisco colleague of mine (hi Ted!) previously coined a term that is quite apropos for what HPC applications now need to target: it’s no longer just about NUMA — NUMA effects are only one of the networks involved. Think bigger: the issue is really about Non-Uniform Network Access (NUNA). Read More »
In a move that will surely cause some head-scratching, Platform has acquired the intellectual property of the-MPI-previously-known-as-HP-MPI.The head scratching part is that Platform already owns Scali MPI. It’s no secret that they recently moved all Scali development to an engineering team based in China. Read More »
Yesterday morning, we (Open MPI) entered what is hopefully a final phase of testing for a “better” implementation of the “leave registered” optimization for OpenFabrics networks. I briefly mentioned this work in a prior blog entry; it’s now finally coming to fruition. Woo hoo!Roland Dreier has pushed a new Linux kernel module upstream for helping user-level applications track when memory leaves their process (it’s not guaranteed that this kernel module will be accepted, but it looks good so far). This kernel module allows MPI implementations, for example, to be alerted when registered memory is freed — a critical operation for certain optimizations and proper under-the-covers resource management.What does this mean to the average MPI application user? It means that future versions of Open MPI (and other MPI implementations) will finally have a solid, bulletproof way to implement the “leave registered” optimization for large message passing. Prior versions of this optimization required nasty, ugly, dirty Linux hacks that sometimes broke real-world applications. Boooo! The new way will not break any applications because it gets help from the underlying operating system (rather than trying to go around or hijack certain operating system functions). Yay! Read More »