Cisco Blogs

Exascale: it’s not just the (networking) hardware

January 24, 2011 - 2 Comments

Many in the HPC research community are starting to work on “exascale” these days — the ability to do 10^18 floating point operations per second.  Exascale is such a difficult problem that it will require new technologies in many different areas before it can become a reality.  Case in point is this entry at Inside HPC today entitled, “InfiniBand Charts Course to Exascale“.

It cites The Exascale Report and a blog entry by Lloyd Dickman at the IBTA about their course going forward.  It’s a good read — Lloyd’s a smart, thoughtful guy.

That being said, there’s a key piece missing from the discussion: the (networking) software.  More specifically: the current OpenFabrics Verbs API abstractions are (probably) unsuitable for exascale, a fact that Fab Tillier (Microsoft) and I presented at the OpenFabrics workshop in Sonoma last year (1up, 2up).

There are two (indirect) notable takeaways from our slides:

  1. If you look at the last slide, our list of networking API requirements doesn’t look much like the current verbs API.  Advancements will need to be made in OpenFabrics networking hardware and the corresponding software API stack.
  2. “OpenFabrics hardware” is actually a fairly wide class of networking gear these days — it also includes multiple forms of Ethernet.  Hence, improvements towards Exascale in the OpenFabrics networking APIs will also benefit Ethernet (!).

Taking an intuitive leap from those points: getting to Exascale might be about things like increasing network bandwidth and decreasing network latency.

…but it might not.  There’s still a lot of guesses about how exascale will actually turn out; there’s still a lot that we don’t know.  Lloyd’s analysis of the trends is quite good, but my whacky thought is: what if exascale doesn’t follow the trends?

For example, it’s possible that exascale will be realized via a truckload of low-power processors (e.g., the Intel Atom, or something that evolves from it) connected via local networking only (e.g., groups of 8 son-of-Atoms share a networking adapter on a n-dimensional networking grid).  This could keep power requirements for the processors and networking nice and low.  In this case, 100+Gbps networking might not be necessary.

And — holy schnikies — that might even work with 1 or 10Gbps Ethernet…!

Is that scenario going to happen?  Who knows?  (I don’t!)  It’s pretty far-fetched, but it’s not outside the realm of possibility.  But just for fun, consider the devil’s advocate position: if we have to invent 20 new technologies for exascale, if we don’t have to invent new networking hardware, that would reduce the complexity a bit, and generally be a Good Thing.

Regardless of what networking hardware is used in exascale, I think the networking software will need some revamping, per the points that Fab and I discussed in our slides.  Writing software for multiple cores is hard; writing petascale-quality software is even harder.  Writing exascale software (with today’s software technology) is likely darn near impossible.  Software — on many different levels — will need to evolve.

I don’t know where the road to exascale will take us.  But it sure will be fun to follow it and see!

(NOTE: Fab+my slides don’t seem to appear on the Sonoma 2010 workshop web site for some reason, so I cached them here on this blog entry — I’ll ping the Sonoma organizers about it…)

(UPDATE: Added a credit and link to The Exascale Report)

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.


  1. Good point. We also need to do deep dives into the nature of computation and algorithms to realize how to make more loosely coupled computation and specialized computation. Understanding task dependencies, reordering them and data flows associated with them under HW/SW constraints is the new direction for large scale computation. I would push efforts towards exploiting with higher quality software our existing computational infrastructures rather than keep adding through brute force more flops and bandwidth (more power consumption too) and lower latencies when we cannot keep efficiencies high enough. Productivity of SW on that type of environments is already being of concern as well. There needs to be some brainstorming around how to couple more efficiently both HW and SW and which one does what. That means SW R&D does not play catch up with the HW R&D among other things.

    • Absolutely agree. FWIW, my $0.02 is that some of our industry's current practices are good, but some are not. Regardless of whether we think of it as "evolving to exascale" or maybe just "advancing the state of the art", many things -- including current practice in software -- need to be advanced to push the limits of our current capabilities. Thanks for the comment!