Cisco Blogs


Cisco Blog > High Performance Computing Networking

Overlap of communication and computation (part 2)

September 19, 2014 at 5:00 am PST

In part 1 of this series, I discussed various peer-wise technologies and techniques that MPI implementations typically use for communication / computation overlap.

MPI-3.0, published in 2012, forced a change in the overlap game.

Specifically: most prior overlap work had been in the area of individual messages between a pair of peers.  These were very helpful for point-to-point messages, especially those of the non-blocking variety.  But MPI-3.0 introduced the concept of non-blocking collective (NBC) operations.  This fundamentally changed the requirements for network hardware offload.

Let me explain.

Read More »

Tags: , ,

Overlap of communication and computation (part 1)

September 16, 2014 at 5:00 am PST

I’ve mentioned computation / communication overlap before (e.g., here, here, and here).

Various types of networks and NICs have long-since had some form of overlap.  Some had better quality overlap than others, from an HPC perspective.

But with MPI-3, we’re really entering a new realm of overlap.  In this first of two blog entries, I’ll explain some of the various flavors of overlap and how they are beneficial to MPI/HPC-style applications.

Read More »

Tags: ,

HPC over UDP

September 12, 2014 at 9:52 am PST

A few months ago, I posted an entry entitled “HPC in L3“.  My only point for that entry was to remove the “HPC in L3? That’s a terrible idea!” knee-jerk reaction that us old-timer HPC types have.

I mention this because we released a free software update a few days ago for the Cisco usNIC product that enables usNIC traffic to flow across UDP (vs. raw L2 frames).  Woo hoo!

That’s right, sports fans: another free software update to make usNIC even better than ever.  Especially across 40Gb interfaces!

Read More »

Tags: , , , , ,

Unsung heros: MPI run time environments

September 7, 2014 at 4:39 am PST

Most people immediately think of short message latency, or perhaps large message bandwidth when thinking about MPI.

But have you ever thought about what your MPI implementation has to do before your application even calls MPI_INIT?

Hint: it’s pretty crazy complex, from an engineering perspective.

Think of it this way: operating systems natively provide a runtime system for individual processes.  You can launch, monitor, and terminate a process with that OS’s native tools.  But now think about extending all of those operating system services to gang-support N processes exactly the same way one process is managed.  And don’t forget that those N processes will be spread across M servers / operating system instances.

Read More »

Tags: ,

Traffic in parallel

August 8, 2014 at 5:00 am PST

In my last entry, I gave a vehicles-driving-in-a-city analogy for network traffic.

Let’s tie that analogy back to HPC and MPI.

Read More »

Tags: , ,