Cisco Blogs


Cisco Blog > High Performance Computing Networking

More Network Locality (Netloc) progress

January 31, 2014 at 9:11 am PST

We announced the Network locality project at SC’13, and generated a LOT of interest (far more than I even anticipated!).  As a refresher, here’s a link to a a blog entry we wrote about Netloc back in November.

There is still much work to be done; we’re actively continuing work in multiple areas:
Read More »

Tags: , , ,

Process and memory affinity: why do you care?

January 31, 2013 at 5:00 am PST

I’ve written about NUMA effects and process affinity on this blog lots of times in the past.  It’s a complex topic that has a lot of real-world affects on your MPI and HPC applications.  If you’re not using processor and memory affinity, you’re likely experiencing performance degradation without even realizing it.

In short:

  1. If you’re not booting your Linux kernel in NUMA mode, you should be.
  2. If you’re not using processor affinity with your MPI/HPC applications, you should be.

Read More »

Tags: , , , ,

Process Affinity in OMPI v1.7 (part 2)

September 11, 2012 at 5:00 am PST

In my last post, I described the Simple mode of Open MPI v1.7′s process affinity system.

The Simple mode is actually quite flexible, and we anticipate that it will meet most users’ needs. However, some users will need more flexibility. That’s what the Expert mode is for.

Before jumping in to the Expert mode, though, let me describe two more features of the revamped v1.7 affinity system.

Read More »

Tags: , , , , ,

Taking MPI Process Affinity to the Next Level

August 31, 2012 at 1:33 pm PST

Process affinity is a hot topic.  With commodity servers getting more and more complex internally (think: NUMA and NUNA), placing and binding individual MPI processes to specific processor, cache, and memory resources is becoming quite important in terms of delivered application performance.

MPI implementations have long offered options for laying out MPI processes across the resources allocated for the job.  Such options typically included round-robin schemes by core or by server node.  Additionally,  MPI processes can be bound to individual processor cores (and even sockets).

Today caps a long-standing effort between Josh Hursey, Terry Dontje, Ralph Castain, and myself (all developers in the Open MPI community) to revamp the processor affinity system in Open MPI.

The first implementation of the Location Aware Mapping Algorithm (LAMA) for process mapping, binding, and ordering has been committed to the Open MPI SVN trunk.  LAMA provides a whole new level of processor affinity control to the end user.

Read More »

Tags: , , , , , ,

Open MPI v1.5 processor affinity options

March 9, 2012 at 5:00 am PST

Today we feature a deep-dive guest post from Ralph Castain, Senior Architecture in the Advanced R&D group at Greenplum, an EMC company.

Jeff is lazy this week, so he asked that I provide some notes on the process binding options available in the Open MPI (OMPI) v1.5 release series.

First, though, a caveat. The binding options in the v1.5 series are pretty much the same as in the prior v1.4 series. However, future releases (beginning with the v1.7 series) will have significantly different options providing a broader array of controls. I won’t address those here, but will do so in a later post.

Read More »

Tags: , , , , , ,