Cisco Blogs


Cisco Blog > High Performance Computing Networking

“RDMA” — what does it mean to MPI applications?

July 16, 2011 at 8:13 am PST

RDMA standard for Remote Direct Memory Access.  The acronym is typically associated with OpenFabrics networks such as iWARP, IBoIP (a.k.a. RoCE), and InfiniBand.  But “RDMA” is typically just today’s popular flavor du jour of a more general concept: RMA (remote memory access), or directly reading and writing to a peer’s memory space.

RMA implementations (including RDMA-based networks, such as OpenFabrics) typically include one or more of the following technologies:

  1. Operating system bypass: userspace applications directly communicate with network hardware.
  2. Hardware offload: network activity is driven by the NIC, not the main CPU
  3. Hardware or software notification: when messages finish sending or are received

How are these technologies typically used in MPI implementations?

Read More »

Tags: , ,

MPI 2.2’s Scalable Process Topologies and Topology Mapping in practice

July 2, 2011 at 6:45 am PST

Today we feature a guest post from Torsten Hoefler, the Performance Modeling and Simulation lead of the Blue Waters project at NCSA, and Adjunct Assistant Professor at the Computer Science department at the University of Illinois at Urbana-Champaign (UIUC) .

I’m sure everybody heard about network topologies, such as 2D or 3D tori, fat-trees, Kautz networks and Clos networks. It can be argued that even multi-core nodes (if run in the “MPI everywhere” mode) are a separate “hierarchical network”. And you probably also wondered how to map your communication on such network topologies in a portable way.

MPI offers support for such optimized mappings since the old days of MPI-1. The process topology functionality if probably one of the most overlooked useful features of MPI. We have to admit that it had some issues and was clumsy to use but it was finally fixed in MPI-2.2. :-) Read More »

Tags: ,

MPI run-time at large scale

June 28, 2011 at 5:00 am PST

With the news that Open MPI is being used on the K supercomputer (i.e., the #1 machine on the June 2011 Top500 list), another colleague of mine, Ralph Castain — who focuses on the run-time system in Open MPI — pointed out that K has over 80,000 processors (over 640K cores!).  That’s ginormous.

He was musing to me that it would be fascinating to see some of K’s run-time data for what most people don’t consider too interesting / sexy: MPI job launch performance.

For example, another public use of Open MPI is on Los Alamos National Lab’s RoadRunner, which has 3,000+ nodes at 4 processes per node (remember RoadRunner?  It was #1 for a while, too).

It’s worth noting that Open MPI starts up full-scale jobs on RoadRunner — meaning that all processes complete MPI_INIT — in less than 1 minute.

Read More »

Tags: , , ,

Open MPI powers 8 petaflops

June 25, 2011 at 6:20 pm PST

A huge congratulations goes goes out to the RIKEN Advanced Institute for Computational Science and Fujitsu teams who saw the K supercomputer achieve over 8 petaflops in the June 2011 Top500 list, published this past week.

8 petaflops absolutely demolishes the prior record of about 2.5 petaflops.  Well done!

A sharp-eyed user pointed out the fact that Open MPI was referenced in the “Programming on K Computer” Fujitsu slides (which is part of the overall SC10 Presentation Download Fujitsu site).  I pinged my Fujitsu colleague on the MPI Forum, Shinji Sumimoto, to ask for a few more details — does K actually use Open MPI with some customizations for their specialized network?  And did Open MPI power the 8 petaflop runs at an amazing 93% efficiency?

Read More »

Tags: , , ,

The commoditization of high performance computing

June 14, 2011 at 5:00 am PST

High Performance Computing (HPC) used to be the exclusive domain of supercomputing national labs and advanced researchers.

This is no longer the case.

Costs have come down, complexity has been reduced, and off-the-shelf solutions are being built to exploit multiple processors these days.  This means that users with large compute needs — which, in an information-rich world, are becoming quite common — can now use techniques pioneered by the HPC community to solve their everyday problems.

Sure, there’s still the bleeding edge of HPC — my Grandma isn’t using a petascale computer (yet).  All the national labs and advanced researchers are still hanging out at the high-end of HPC, pushing the state of the art to get faster and bigger results that simply weren’t possible before.

Read More »

Tags: , ,