Everything old is new again — NUMA is back!
With NUMA going mainstream, high performance software — MPI applications and otherwise — might need to be re-tuned to maintain their current performance levels.
A less-acknowledged aspect of HPC systems is the multiple levels of networks that are traversed to get data from MPI process A to MPI process B. The heterogeneous, multi-level network is going to become more important (again) in your applications’ overall performance, especially as per-compute-server-core-counts increase.
That is, it’s not going to only be about the bandwidth and latency of your “Ethermyriband” network. It’s also going to be about the network (or networks!) inside each compute server.
A Cisco colleague of mine (hi Ted!) previously coined a term that is quite apropos for what HPC applications now need to target: it’s no longer just about NUMA — NUMA effects are only one of the networks involved.
Think bigger: the issue is really about Non-Uniform Network Access (NUNA). Read More »
Tags: HPC, mpi, NUMA, NUNA, process affinity
In a move that will surely cause some head-scratching, Platform has acquired the intellectual property of the-MPI-previously-known-as-HP-MPI.The head scratching part is that Platform already owns Scali MPI. It’s no secret that they recently moved all Scali development to an engineering team based in China. Read More »
Yesterday morning, we (Open MPI) entered what is hopefully a final phase of testing for a “better” implementation of the “leave registered” optimization for OpenFabrics networks. I briefly mentioned this work in a prior blog entry; it’s now finally coming to fruition. Woo hoo!Roland Dreier has pushed a new Linux kernel module upstream for helping user-level applications track when memory leaves their process (it’s not guaranteed that this kernel module will be accepted, but it looks good so far). This kernel module allows MPI implementations, for example, to be alerted when registered memory is freed — a critical operation for certain optimizations and proper under-the-covers resource management.What does this mean to the average MPI application user? It means that future versions of Open MPI (and other MPI implementations) will finally have a solid, bulletproof way to implement the “leave registered” optimization for large message passing. Prior versions of this optimization required nasty, ugly, dirty Linux hacks that sometimes broke real-world applications. Boooo! The new way will not break any applications because it gets help from the underlying operating system (rather than trying to go around or hijack certain operating system functions). Yay! Read More »
I find that there are generally two types of MPI application programmers:
- Those that only use standard (“blocking”) mode sends and receives
- Those that use non-blocking sends and receives
The topic of whether an MPI application should use only simple standard mode sends and receives or dive into the somewhat-more-complex non-blocking modes of communication comes up not-infrequently (it just came up again on the Open MPI user’s mailing list the other day). It’s always a challenge for programmers who are new to MPI to figure out which model they should use. Recently, we came across a user who chose a third solution: use MPI_SENDRECV. Read More »
Here’s a great quote that I ran across the other day from an article entitled A short history of btrfs on lwn.net by Valerie Aurora. Valerie was specifically talking about benchmarking filesystems, but you could replace the words “file systems” with just about any technology:
When it comes to file systems, it’s hard to tell truth from rumor from vile slander: the code is so complex, the personalities are so exaggerated, and the users are so angry when they lose their data. You can’t even settle things with a battle of the benchmarks: file system workloads vary so wildly that you can make a plausible argument for why any benchmark is either totally irrelevant or crucially important.
This remark is definitely true in high performance computing realms, too. Let me use it to give a little insight into MPI implementer behavior, with a specific case study from the Open MPI project. Read More »