Why do different MPI’s perform differently?
Sometimes my wife wonders why I have a job. She asks me: “Aren’t you just moving bytes from point A to point B? How hard is that?”
In some ways, she’s right — it’s not hard. Any computer users effects the act of moving bytes from point A to point B oodles of times a day. Email, for example, is message passing — the heart of email is moving bytes from point A to point B.
But like most real-world engineering issues, it’s not quite that simple. Indeed, if you talk to most email server administrators, they will readily launch into highly complex discussions of how delivering an email from point A to point B is an incredibly intricate, complicated process.
MPI is somewhat like baseball: a skilled batter can hit a home run every time if they know exactly what pitch is coming. Pitchers and catchers work together to obfuscate exactly what pitch is coming so that the batter has to make a split-second determination of the best swing. If the batter knew the speed, location, and spin of the pitch beforehand, it’s simply a matter of selecting the right swing to maximize the physics of hitting the ball wherever the batter desires.
Similar to a baseball batter that has to be able to hit a wide variety of pitches, MPI implementations are generalized message passing engines — they have to work absolutely, 100% correctly in a huge variety of situations. As such, they have to cover a lot of corner cases, and sometimes the requirements these corner cases conflict with each other. Decisions have to be made about where MPI wants to “hit the ball,” as it were: maximize bandwidth? Minimize latency? Maximize message rate? Minimize resource usage? …or, more frequently, what is the correct balance between all of those metrics (and many more) to effect good performance across a wide range of scenarios?
Run-time tunable parameters in MPI implementations are quite common these days. They allow you to give the MPI implementation a clue as to what pitch is coming, so to speak. Setting such tunables properly allow MPI to know whether your application tends to throw fast balls, curve balls, sliders, …etc. And therefore can greatly increase an MPI implementation’s chance of hitting a home run.
So don’t be afraid of diving into your MPI implementation’s documentation and searching out a few tunable parameters that will help your applications’ performance. And be aware that different applications may require different parameters. Sure, you’ll always get 100% correct performance without any tuning — but you may be able to eek out a bit more performance (as measured by a bunch of different metrics) with a little effort.