Cisco Blogs


Cisco Blog > High Performance Computing Networking

Supercomputing is upon us!

October 31, 2014 at 11:21 am PST

It’s that time of year again — we’re at about T-2.5 weeks to the Supercomputing conference and trade show;  SC’14 is in New Orleans, November 16-21.

Are you going to get some tasty gumbo and supercharged computing power?  

If so, come say hi!  The Cisco booth is 2715.

Read More »

Tags: , ,

The “vader” shared memory transport in Open MPI: Now featuring 3 flavors of zero copy!

October 29, 2014 at 11:42 am PST

Today’s blog post is by Nathan Hjelm, a Research Scientist at Los Alamos National Laboratory, and a core developer on the Open MPI project.

The latest version of the “vader” shared memory Byte Transport Layer (BTL) in the upcoming Open MPI v1.8.4 release is bringing better small message latency and improved support for “zero-copy” transfers.

NOTE: “zero copy” in the term typically used, even though it really means “single copy” (copy the message from the sender to the receiver).  Think of it as “zero extra copies, i.e., one copy instead of two.”

Read More »

Tags: , , , ,

HPC schedulers: What is a “slot”?

October 22, 2014 at 9:48 am PST

Today’s guest post comes from Ralph Castain, a principle engineer at Intel.  The bulk of this post is an email he sent explaining the concept of a “slot” in typical HPC schedulers. This is a little departure from the normal fare on this blog, but is still a critical concept to understand for running HPC applications efficiently.  With his permission, I re-publish Ralph’s email here because it’s a great analogy to explain the “slot” concept, which is broadly applicable to HPC users.

The question of “what is a [scheduler] slot” when discussing schedulers came up yesterday and was an obvious source of confusion, so let me try to explain the concept using a simple model.

Suppose I own a fleet of cars at several locations around the country. I am in the business of providing rides for people. Each car has 5 seats in it.

Read More »

Tags: ,

usNIC provider contributed to libfabric

October 20, 2014 at 5:54 am PST

Today’s guest post is by Reese Faucette, one of my fellow usNIC team members here at Cisco.

I’m pleased to announce that this past Friday, Cisco contributed a usNIC-based provider to libfabric, the new API in the works from OpenFabrics Interfaces Working Group.

(Editor’s note: I’ve blogged about libfabric before)

Yes, the road is littered with the bodies of APIs that were great ideas at the time (or not), but that doesn’t change the fact neither Berkeley sockets nor Linux Verbs are really adequate as cross-vendor, high-performance programming APIs.

Read More »

Tags: , , ,

MPI-3.1

October 7, 2014 at 6:18 am PST

MPI 3 logoAs you probably already know, the MPI-3.0 document was published in September of 2012.

We even got a new logo for MPI-3.  Woo hoo!

The MPI Forum has been busy working on both errata to MPI-3.0 (which will be collated and published as “MPI-3.1″) and all-new functionality for MPI-4.0.

The current plan is to finalize all errata and outstanding issues for MPI-3.1 in our December 2014 meeting (i.e., in the post-Supercomputing lull).  This means that we can vote on the final MPI-3.1 document at the next MPI Forum meeting in March 2015.

MPI is sometimes criticized for being “slow” in development.  Why on earth would it take 2 years to formalize errata from the MPI-3.0 document into an MPI-3.1 document?

The answer is (at least) twofold:

  1. This stuff is really, really complicated.  What appears to be a trivial issue almost always turns out to have deeper implications that really need to be understood before proceeding.  This kind of deliberate thought and process simply takes time.
  2. MPI is a standard.  Publishing a new version of that standard has a very large impact; it decides the course of many vendors, researchers, and users.  Care must be taken to get that publication as correct as possible.  Perfection is unlikely — as scientists and engineers, we absolutely have to admit that — but we want to be as close to fully-correct as possible.

MPI-4 is still “in the works”.  Big New Things, such as endpoints and fault tolerant behavior is still under active development.  MPI-4 is still a ways off, so it’s a bit early to start making predictions about what will/will not be included.

Tags: , ,