Open MPI: behind the scenes
Working on an MPI implementation isn’t always sexy. There’s a lot of grubby, grubby work that needs to happen on a continual basis to produce a production-quality MPI implementation that can be used for real-world HPC applications.
Sure, we always need to work on optimizing short message latency.
Sure, we need to keep driving MPI’s internal resource utilization down so that apps get more use of hardware.
But there’s also lots of “uninteresting” — yet still critically important — stuff that happens behind the scenes. A little over a week ago, we had a face-to-face Open MPI developer meeting at the Cisco Richardson, TX, USA campus.
We like to get together and have face-to-face developer meetings a few times a year, because the developer community spreads across many organizations and time zones.
Collaboration tools like email, Github, and Webex are great for day-to-day work.
But you just can’t beat getting a bunch of computer scientists and engineers together in a room with a whiteboard. Especially when talking through complex scenarios, developing long-term project goals, deciding who is going to implement what (and how!) in major new code initiatives, and how this all maps back to the MPI Forum and upcoming versions of the MPI standard (e.g., MPI-3.1 is likely to be published soon).
Equally importantly, we talk though issues that initially seem “uninteresting,” but are still pretty critical to delivering a good end-user experience.
Here’s some of these kinds of issues that we spent time on at our meeting:
- Should Open MPI continue to embed plugin functionality? Open MPI has always been based on the ideas of plugins that are loaded at run time. Due to changes in the GNU Libtool and Autoconf packages, should we change this default behavior to slurp in plugins to the main MPI library at compile time (vs. run time)? This is a surprisingly complex issue: there are second-order effects whichever way we choose to go.
- How can we get better Continuous Integration through Github? Mellanox has been providing Jenkins-based “smoke testing” for Github pull requests (which has been great). But the Github integration is really only designed for a single Jenkins server. How can we get more Open MPI organizations involved in this CI initiative?
- How can we reduce the memory footprint and resource consumption of Open MPI’s core library? Most of Open MPI is fairly efficient, but there are definitely places that will not scale up to exascale-sized applications. We need to change some of Open MPI’s basic data structures and network interactions to scale to Huge sizes.
- Have we received signed intellectual property agreements from all ongoing contributors to the code base? Open MPI is released under the BSD license, and we want to ensure that it can continue to do so. One mechanism we use for that is getting all developer organizations to sign a contribution agreement. “Open source” software — particularly when there are big corporations involved — requires some work and diligence to ensure that it can stay open source.
- What outdated / deprecated functionality can we remove? What impact will it have on real users? Is it worth it to remove such old functionality, or is it harmless to leave it in?
- When developing new MPI extensions (e.g., to support various Open MPI organizations’ efforts in the MPI Forum), should its public symbols be prefixed with “OMPI_”, “MPIX_”, or something else? Again, this is a surprisingly complex issue.
When you’re working with a wold-wide audience, even small issues like these ones are actually important.
…in addition to all the obvious / important issues. 🙂