Cisco Blogs


Cisco Blog > High Performance Computing Networking

Latency Analogies (part 2)

April 8, 2013 at 6:25 am PST

In a prior blog post, I talked about latency analogies.  I compared levels of latencies to your home, your neighborhood, a far-away neighborhood, and another city.  I talked about these localities in terms of communication.

Let’s extend that analogy to talk about data locality.

Read More »

Tags: , , ,

Latency Analogies

March 31, 2013 at 6:00 am PST

Multiple readers have told me that it is difficult for them to understand and/or visualize the effects of latency on their HPC applications, particularly in modern NUMA (non-uniform memory access) and NUNA (non-uniform network access) environments.

Let’s breaks down the different levels of latency in a typical modern server and network computing environments.

Read More »

Tags: , , ,

Taking MPI Process Affinity to the Next Level

August 31, 2012 at 1:33 pm PST

Process affinity is a hot topic.  With commodity servers getting more and more complex internally (think: NUMA and NUNA), placing and binding individual MPI processes to specific processor, cache, and memory resources is becoming quite important in terms of delivered application performance.

MPI implementations have long offered options for laying out MPI processes across the resources allocated for the job.  Such options typically included round-robin schemes by core or by server node.  Additionally,  MPI processes can be bound to individual processor cores (and even sockets).

Today caps a long-standing effort between Josh Hursey, Terry Dontje, Ralph Castain, and myself (all developers in the Open MPI community) to revamp the processor affinity system in Open MPI.

The first implementation of the Location Aware Mapping Algorithm (LAMA) for process mapping, binding, and ordering has been committed to the Open MPI SVN trunk.  LAMA provides a whole new level of processor affinity control to the end user.

Read More »

Tags: , , , , , ,

hwloc 1.0 released!

May 18, 2010 at 12:00 pm PST

At long last, we have released a stable, production-quality version of Hardware Locality (hwloc).  Yay!

If you’ve missed all my prior discussions about hwloc, hwloc provides command line tools and a C API to obtain the hierarchical map of key computing elements, such as: NUMA memory nodes, shared caches, processor sockets, processor cores, and processing units (logical processors or “threads”). hwloc also gathers various attributes such as cache and memory information, and is portable across a variety of different operating systems and platforms.

In an increasing NUMA (and NUNA!) world, hwloc is a valuable tool for high performance.

Read More »

Tags: , , , ,

Traffic

April 23, 2010 at 12:00 pm PST

Traffic.  It’s a funny thing.  On my daily drive to work, I see (what appear to be) oddities and contradictions frequently.  For example, although the lanes on my side of the highway are running fast and clear, the other side is all jammed up.  But a half mile later, the other side is running fast and clear, and my lanes have been reduced to half-speed.  A short distance further, I’m zipping along again at 55mph (ahem).

Sometimes the reasons behind traffic congestion are obvious.  For example, when you drive through a busy interchange, it’s easy to understand how lots of vehicles entering and exiting the roadway can force you to slow down.  But sometimes the traffic flow issues are quite subtle; congestion may be caused by a non-obvious confluence of second- and third-order effects.

The parallels from highway traffic to networking are quite obvious, but the analogy can go much deeper when you consider that modern computational clusters span multiple different networks — we’re entering an era of Non-Uniform Network Architectures (NUNAs).

Read More »

Tags: , , , , ,