Cisco Blogs


Cisco Blog > High Performance Computing Networking

Sockets, cores, and hyperthreads… oh my!

October 15, 2010 at 5:00 am PST

Core counts are going up.  Cisco’s C460 rack-mount server series, for example, can have up to 32 Nehalem EX cores.  As a direct result, we may well be returning to the era of running more than one MPI process per server.  This has long been true in “big iron” parallel resources, but commodity Linux HPC clusters have tended towards the one-MPI-job-per-server model in recent history.

Because of this trend, I have an open-ended question for MPI users and cluster administrators: how do you want to bind MPI processes to processors?  For example: what kinds of binding patterns do you want?  How many hyperthreads / cores / sockets do you want each process to bind to?  How do you want to specify what process binds where?  What level of granularity of control do you want / need?  (…and so on)

We are finding that every user we ask seems to have slightly different answers.  What do you think?  Let me know in the comments, below.

Read More »

Tags: , , ,

hwloc 1.0 released!

May 18, 2010 at 12:00 pm PST

At long last, we have released a stable, production-quality version of Hardware Locality (hwloc).  Yay!

If you’ve missed all my prior discussions about hwloc, hwloc provides command line tools and a C API to obtain the hierarchical map of key computing elements, such as: NUMA memory nodes, shared caches, processor sockets, processor cores, and processing units (logical processors or “threads”). hwloc also gathers various attributes such as cache and memory information, and is portable across a variety of different operating systems and platforms.

In an increasing NUMA (and NUNA!) world, hwloc is a valuable tool for high performance.

Read More »

Tags: , , , ,

Traffic

April 23, 2010 at 12:00 pm PST

Traffic.  It’s a funny thing.  On my daily drive to work, I see (what appear to be) oddities and contradictions frequently.  For example, although the lanes on my side of the highway are running fast and clear, the other side is all jammed up.  But a half mile later, the other side is running fast and clear, and my lanes have been reduced to half-speed.  A short distance further, I’m zipping along again at 55mph (ahem).

Sometimes the reasons behind traffic congestion are obvious.  For example, when you drive through a busy interchange, it’s easy to understand how lots of vehicles entering and exiting the roadway can force you to slow down.  But sometimes the traffic flow issues are quite subtle; congestion may be caused by a non-obvious confluence of second- and third-order effects.

The parallels from highway traffic to networking are quite obvious, but the analogy can go much deeper when you consider that modern computational clusters span multiple different networks — we’re entering an era of Non-Uniform Network Architectures (NUNAs).

Read More »

Tags: , , , , ,

Announcing hwloc: portable hardware locality open source software

September 13, 2009 at 12:00 pm PST

(this blog entry co-written by Brice Goglin and Samuel Thibault from the INRIA Runtime Team)

We’re pleased to announce a new open source software project: Hardware Locality (or “hwloc“, for short).  The hwloc software discovers and maps the NUMA nodes, shared caches, and processor sockets, cores, and threads of Linux/Unix and Windows servers.  The resulting topological information can be displayed graphically or conveyed programatically though a C language API.  Applications (and middleware) that use this information can optimize their performance in a variety of ways, including tuning computational cores to fit cache sizes and utilizing data locality-aware algorithms.

hwloc actually represents the merger of two prior open source software projects:

  • libtopology, a package for discovering and reporting the internal processor and cache topology in Unix and Windows servers.
  • Portable Linux Processor Affinity (PLPA), a package for solving Linux topological processor binding compatibility issues

Read More »

Tags: , , ,

Non Uniform Network Access (NUNA)

August 27, 2009 at 12:00 pm PST

Everything old is new again — NUMA is back!

With NUMA going mainstream, high performance software — MPI applications and otherwise — might need to be re-tuned to maintain their current performance levels.

A less-acknowledged aspect of HPC systems is the multiple levels of networks that are traversed to get data from MPI process A to MPI process B. The heterogeneous, multi-level network is going to become more important (again) in your applications’ overall performance, especially as per-compute-server-core-counts increase.

That is, it’s not going to only be about the bandwidth and latency of your “Ethermyriband” network. It’s also going to be about the network (or networks!) inside each compute server.

A Cisco colleague of mine (hi Ted!) previously coined a term that is quite apropos for what HPC applications now need to target: it’s no longer just about NUMA — NUMA effects are only one of the networks involved.

Think bigger: the issue is really about Non-Uniform Network Access (NUNA). Read More »

Tags: , , , ,