Cisco Blog > High Performance Computing Networking
January 23, 2010 at 12:00 pm PST
I just ran across a great blog entry about SGE debuting topology-aware scheduling. Dan Templeton does a great job of describing the need for processor topology-aware job scheduling within a server. Many MPI jobs fit exactly within his description of applications that have “serious resource needs” — they typically require lots of CPU and/or network (or other I/O). Hence, scheduling an MPI job intelligently across not only the network, but also across the network and resources inside the server, is pretty darn important. It’s all about location, location, location!
Particularly as core counts in individual server are going up.
Particularly as networks get more complicated inside individual servers.
Particularly if heterogeneous computing inside a single server becomes popular.
Particularly as resources are now pretty much guaranteed to be non-uniform within an individual server.
These are exactly the reasons that, even though I’m a network middleware developer, I spend time with server-specific projects like hwloc — you really have to take a holistic approach in order to maximize performance.
Read More »
January 13, 2010 at 12:00 pm PST
We were recording an RCE-Cast with the PETSc guys when we realized that we had just about hit our 1 year anniversary; the first recording was posted on January 17, 2009. Wow! I had no idea that we had been doing this so long — Brock and I are both very pleasantly surprised that we’ve managed to keep it going this long.
If you’re unaware of RCE-Cast, it’s a podcast about “Research Computing and Engineering” that Brock Palen and I record every two weeks. We talk to a variety of software and hardware projects, and/or any other topic that seems to be related to HPC- or RCE-like things.
Here’s an experiment for our next interview with the Condor folks: “tweet @brockpalen questions for #condor http://tinyurl.com/hqzhm next guest on #RCE“.
Read More »
January 8, 2010 at 12:00 pm PST
We had an astonishing 837 responses to the MPI User Survey. Many thanks to all of you who filled out the survey!
The MPI Forum minions are busy analyzing the data — there’s a lot! We’ll have more definitive results later, but for now, see below the jump for a few quickie facts from the results.
Read More »
December 23, 2009 at 12:00 pm PST
Sorry for the lack of activity here this month, folks. As usual, December is the month to recover from SC and catch up on everything else you were supposed to be doing. So I’ll try to make up for it with a small-but-tasty Christmas morsel. Then I’ll disappear for a long winter’s nap; you likely won’t see me until January (shh! don’t tell my wife that I’m working today!).
The topic of my musing today is one that has come up multiple times in conversation over the past two weeks. Although I’m certainly not the only guy to talk about this on the interwebs, today’s topic is server-side hardware offload of network communications.
Read More »
December 8, 2009 at 12:00 pm PST
Here’s some random quick notes:
- Brock posted the MPI-3 podcast on rce-cast.com yesterday. Have a listen if you’d like to hear some of the new/upcoming efforts in MPI-3.
- I saw a post on the MVAPICH list the other day that some random user picked up hwloc and submitted a patch to integrate it into MVAPICH. Huzzah!
- I hear quite a bit about MPI being run on the Intel prototype 40-core chip. This is an interesting subject, but quite a bit remains to be seen about the programming models of BigCore chips. The Intel press releases state that there is hardware support for message passing on the silicon, but what exactly does that mean? Do we have direct access to that from user space? …that and many other questions will be discussed over time.
Who’s going to SC10 in New Orleans next year?
Read More »
December 7, 2009 at 12:00 pm PST
Cisco announced this past weekend a new open source effort that is being launched under the Open MPI project umbrella named the Open Resilient Cluster Manager (or “OpenRCM”, or — my personal favorite — “ORCM”. Say it 10 times fast!).
The Open MPI community is pleased to announce the establishment of a new subproject built upon the Open MPI code base. Using work initially contributed by Cisco Systems, the Open Resilient Cluster Manager is an open source project released under the Open MPI [BSD] license focused on development of an “always on” resource manager for systems spanning the range from embedded to very large clusters.
The ORCM web site neatly lays out the project goals:
- Maintain operation of running applications in the face of single or multiple failures of any given process within that application.
- Proactively detect incipient failures (hardware and/or software) and respond appropriately to maintain overall system operation.
- Support both MPI and non-MPI applications.
- Provide a research platform for exploring new concepts and methods in resilient systems.
“That’s great,” you say. “But why on earth do we need yet another cluster resource manager?”
Read More »
November 30, 2009 at 12:00 pm PST
Just a quick note today: Brock Palen and I just recorded an interview with Bill Gropp, MPI-2.2 Chair, and Rich Graham, MPI-3.0 Chair. Brock should be posting the podcast up on www.rce-cast.com within a week or so.
Read More »
November 24, 2009 at 12:00 pm PST
Pardon the intrusion folks, I need to re-claim this blog on Technorati, so I need to publish this claim code where Technorati can find it. I think I can delete this entry after Technorati verifies me; we’ll see…
Here is is, Technorati: GUYH9B8ZVYKR
Read More »
November 23, 2009 at 12:00 pm PST
EDITOR’S NOTE: As with entries about hwloc, this announcement entry is a little off the beaten track for high performance networks, but it is definitely related and relevant.
The good folks at Argonne National Labs have released OpenPA (Portable Atomics) v1.0.2. It’s a small library that implements processor atomic operations in a portable fashion (i.e., across platforms, compilers, etc. — including inline assembly support). Here’s a link to the release announcement and the general OpenPA web site.
While OpenPA is not directly related to high performance networking, it is highly useful to have an extremely efficient/optimized set of atomic operations when multiple threads are sharing a single resource — such as a network resource. Hence, this companion library is quite useful in driving full utilization of common network resources. I keep beating the same drum: as core counts are going up, little utilities like OpenPA and hwloc are going to be very, very important to extract all the performance from your server that you expect to get.
Read More »
November 20, 2009 at 12:00 pm PST
First the Fortran WG asked for some specific guidance (thank you very much for all who replied!), now the main Forum itself is conducting a community-wide survey to solicit feedback to help shape the MPI-3 standards process. To protect from spam, the survey requires a password: mpi3.
In this survey, the MPI Forum is asking as many people as possible for feedback on the MPI-3 process — what features to include, what features to not include, etc.
We encourage you to forward this survey on to as many interested and relevant parties as possible.
It will take approximately 10 minutes to complete the questionnaire.
Read More »
November 16, 2009 at 12:00 pm PST
I have nothing deep to say for this week’s blog entry since I’m sitting here in the Portland convention center feverishly working to finish my SC09 slides. My partner in Fortran crime, Craig Rasmussen, is sitting next to me, feverishly working on our prototype Fortran 2003 MPI bindings implementation so that we can hand out proof-of-concept tarballs at the MPI Forum BOF on Wednesday evening.
All in all — it’s a normal beginning to Supercomputing. 
The #SC09 twitter feed is going crazy with about 6 billion tweets. Just make sure you use the patented SC09 Fist Bump when in Portland.
Also be sure to drop by and see me in the Cisco Booth (#1847 — get a Cisco t-shirt!). I’ll be walking around the floor for the Gala opening, but I have booth duty most mornings this week. I’ll also be at the Open MPI BOF on Wednesday at 12:15pm and the MPI Forum BOF, also on Wednesday, but at 5:30pm.
Read More »
November 5, 2009 at 12:00 pm PST
It took a bunch of testing, but we finally got the first formal public release of hwloc (“Hardware Locality”) out the door. From the announcement:
“hwloc provides command line tools and a C API to obtain the hierarchical map of key computing elements, such as: NUMA memory nodes, shared caches, processor sockets, processor cores, and processor “threads”. hwloc also gathers various attributes such as cache and memory information, and is portable across a variety of different operating systems and platforms.”
hwloc was primarily developed with High Performance Computing (HPC) applications in mind, but it is generally applicable to any software that wants or needs to know the physical layout of the machine on which it is running. This is becoming increasingly important in today’s ever-growing-core-count compute servers.
Read More »
October 29, 2009 at 12:00 pm PST
Since there were a goodly number of comments on the MPI-3 Fortran question from the other day (please keep spreading that post around — the most feedback we get, the better!), I thought I’d give a quick synopsis of the other MPI-3 Forum Working Groups. That is just to let you know that there’s more going on in MPI-3 than just new yummy Fortran goodness!
The links below go to the wiki pages of the various working groups (WG). Some wiki pages are more active than others; some wiki pages are fairly dormant, but that doesn’t necessarily mean that the WG itself is dormant. Some WG’s simply choose to communicate more via email and/or regular teleconferences. For example, the Tools WG has only sporatic emails on its mailing list, but it has a regularly-updated wiki and regular teleconferences + meeting times during the bi-monthly MPI Forum meetings. Hence, each WG may work and communicate differently than its peers.
Read More »
October 23, 2009 at 12:00 pm PST
As many of you know, I’m an active member of the MPI Forum. We have recently completed MPI-2.2 and have shifted our sights to focus on MPI-3.
For some inexplicable reason, I’ve become heavily involved in the MPI-3 Fortran working group. There are some well-known problems with the MPI-2 Fortran 90 interfaces; the short version of the MPI-3 Fortran WG’s mission is to “fix those problems.”
A great summary of what the Fortran WG is planning for MPI-3 is available on the Forum wiki page; we’d really appreciate feedback from the Fortran MPI developer community on these ideas.
There is definitely one significant issue that we need feedback from the community before making a decision. Craig Rasmussen from Los Alamos National Laboratory asked me to post the following “request for information” to the greater Fortran MPI developer community. Please send feedback either via comments on this blog entry, email to me directly, or to the MPI-3 Fortran working group mailing list.
Read More »
October 22, 2009 at 12:00 pm PST
Debugging parallel applications is hard. There’s no way around it: bugs can get infinitely more complex when you have not just one thread of control running, but rather you have N processes — each with M threads — all running simultaneously. Printf-style debugging is simply not sufficient; when a process is running on a remote compute node, even the output from a print statement can take time to be sent across the network and then displayed on your screen — time that can mask the actual problem because it shows up significantly later than the actual problem occurred.
Tools are vital for parallel application development, and there are oodles of good ones out there. I just wanted to highlight one really cool open source (free!) tool today called “Padb“. Written by Ashley Pittman, it’s a small but surprisingly useful tool. One scenario where I find Padb helpful is when an MPI job “hangs” — it just seems to stop progress, but does not die or abort. Padb can go find all the individual MPI processes, attach to them, and generate stack traces and display variable and parameter dumps for each process in the MPI job. This allows a developer to see where the application is hung — an important first step in the troubleshooting process.
Read More »