Multiple readers have told me that it is difficult for them to understand and/or visualize the effects of latency on their HPC applications, particularly in modern NUMA (non-uniform memory access) and NUNA (non-uniform network access) environments.
Let’s breaks down the different levels of latency in a typical modern server and network computing environments.
Here’s a familiar analogy: your home. You spend a lot of time at your home; it’s where you live and play.
You have friends and neighbors that live right near you in the same subdivision. You interact with these neighbors frequently, if for no other reason than they’re close by / easy and fast to get to.
But you also have friends in other subdivisions. They’re a several-minute drive across surface streets from your home. You interact with these people, too, but you have to think and plan a little more vs. just walking next door.
And then you have friends who live far away — it takes a long time to get there, and involves travel over long-distance highways.
The distance-from-home analogy is pretty easy to understand:
- Your home is a computational core.
- Your immediate friends and neighbors are other cores on the same processor socket. Communication with them is easy, fast, and cheap.
- Your remote friends are other processor cores on remote processor sockets in the same server. You communicate with them over the internal server network (e.g., QPI or HyperTransport).
- Your far-away friends are processors in other servers. You communicate with them over the external network (e.g., Ethernet).
So when you send a message to a peer (e.g,. MPI_SEND to another MPI process), consider with whom you’re communicating: are they next door, in the next subdivision, or in the next city? That gives you an idea of the magnitude of the cost of communicating with them.
But let’s add another dimension here: caches and RAM. Data locality is a major factor in performance, and is frequently under-appreciated.
Stay tuned; we’ll talk about data locality by extending this analogy in a future entry…
Here’s one more:
And let’s not forget–Rear Admiral Grace Murray Hopper on Visualizing Nanoseconds: http://www.youtube.com/watch?v=JEpsKnWZrJ8
It’s worth pointing out that the same consideration of distance that data must move applies to cross-node interconnects as well, within a single HPC system. The effects this can have are pretty wide-ranging, and sometimes quite substantial. For instance, here’s a pile of recent papers address the issue:
Comments are closed.