Cisco Blogs


Cisco Blog > High Performance Computing Networking

Come see us at SC09!

I have nothing deep to say for this week’s blog entry since I’m sitting here in the Portland convention center feverishly working to finish my SC09 slides.  My partner in Fortran crime, Craig Rasmussen, is sitting next to me, feverishly working on our prototype Fortran 2003 MPI bindings implementation so that we can hand out proof-of-concept tarballs at the MPI Forum BOF on Wednesday evening.

All in all — it’s a normal beginning to Supercomputing.  Wink

The #SC09 twitter feed is going crazy with about 6 billion tweets.  Just make sure you use the patented SC09 Fist Bump when in Portland.

Also be sure to drop by and see me in the Cisco Booth (#1847 — get a Cisco t-shirt!).  I’ll be walking around the floor for the Gala opening, but I have booth duty most mornings this week.  I’ll also be at the Open MPI BOF on Wednesday at 12:15pm and the MPI Forum BOF, also on Wednesday, but at 5:30pm.

Read More »

hwloc v0.9.2 released

It took a bunch of testing, but we finally got the first formal public release of hwloc (“Hardware Locality”) out the door.  From the announcement:

“hwloc provides command line tools and a C API to obtain the hierarchical map of key computing elements, such as: NUMA memory nodes, shared caches, processor sockets, processor cores, and processor “threads”. hwloc also gathers various attributes such as cache and memory information, and is portable across a variety of different operating systems and platforms.”

hwloc was primarily developed with High Performance Computing (HPC) applications in mind, but it is generally applicable to any software that wants or needs to know the physical layout of the machine on which it is running.  This is becoming increasingly important in today’s ever-growing-core-count compute servers.

Read More »

Other MPI-3 Forum activities

Since there were a goodly number of comments on the MPI-3 Fortran question from the other day (please keep spreading that post around — the most feedback we get, the better!), I thought I’d give a quick synopsis of the other MPI-3 Forum Working Groups.  That is just to let you know that there’s more going on in MPI-3 than just new yummy Fortran goodness!

The links below go to the wiki pages of the various working groups (WG).  Some wiki pages are more active than others; some wiki pages are fairly dormant, but that doesn’t necessarily mean that the WG itself is dormant.  Some WG’s simply choose to communicate more via email and/or regular teleconferences.  For example, the Tools WG has only sporatic emails on its mailing list, but it has a regularly-updated wiki and regular teleconferences + meeting times during the bi-monthly MPI Forum meetings.  Hence, each WG may work and communicate differently than its peers.

Read More »

MPI-3 Fortran Community Feedback Needed!

As many of you know, I’m an active member of the MPI Forum.  We have recently completed MPI-2.2 and have shifted our sights to focus on MPI-3. 

For some inexplicable reason, I’ve become heavily involved in the MPI-3 Fortran working group.  There are some well-known problems with the MPI-2 Fortran 90 interfaces; the short version of the MPI-3 Fortran WG’s mission is to “fix those problems.” 

A great summary of what the Fortran WG is planning for MPI-3 is available on the Forum wiki page; we’d really appreciate feedback from the Fortran MPI developer community on these ideas. 

There is definitely one significant issue that we need feedback from the community before making a decision.  Craig Rasmussen from Los Alamos National Laboratory asked me to post the following “request for information” to the greater Fortran MPI developer community.  Please send feedback either via comments on this blog entry, email to me directly, or to the MPI-3 Fortran working group mailing list.

Read More »

Parallel debugging

Debugging parallel applications is hard.  There’s no way around it: bugs can get infinitely more complex when you have not just one thread of control running, but rather you have N processes — each with M threads — all running simultaneously.  Printf-style debugging is simply not sufficient; when a process is running on a remote compute node, even the output from a print statement can take time to be sent across the network and then displayed on your screen — time that can mask the actual problem because it shows up significantly later than the actual problem occurred.

Tools are vital for parallel application development, and there are oodles of good ones out there.  I just wanted to highlight one really cool open source (free!) tool today called “Padb“.  Written by Ashley Pittman, it’s a small but surprisingly useful tool.  One scenario where I find Padb helpful is when an MPI job “hangs” — it just seems to stop progress, but does not die or abort.  Padb can go find all the individual MPI processes, attach to them, and generate stack traces and display variable and parameter dumps for each process in the MPI job.  This allows a developer to see where the application is hung — an important first step in the troubleshooting process.

Read More »