Cisco Blogs


Cisco Blog > High Performance Computing Networking

Frequent flyer

June 22, 2010
at 12:00 pm PST

A colleague recently told me that he took two round trip flights from California to Florida for the sole purpose of nudging him into the next frequent flyer status level (over 100,000 miles in a year, in his case).  “At the time, I questioned the wisdom waking up early enough to get on a plane at 6am on a Saturday morning,” he said.  “But I’ve been automatically upgraded to first class ever since.  So — good decision.”  Even more surprising is that, by random chance, he found another guy who took both of the same flights for exactly the same reason.

Parallel programming is kinda like that.  If you’re just venturing into a concurrent programming world, it seems weird, awkward, and possibly even counter-intuitive.  But once you learn how to do it right, the benefits are continual.  And you’ll find that others are doing the same thing.

Let’s take a counter-intuitive example: sending N bytes between a pair of peers can be accomplished by sending M messages, each of size N/M bytes (assuming N/M is an integer).  Is it better to have a large or small value for M?

…got an answer?

Well, you’re wrong.  (probably) Smile

I think one of the recurring themes of my blog is that the answer to many questions is: “it depends.”

Here’s some followup questions that may clarify my answer:

  • Are the M messages sent all at once, or over spaced out over a lengthy period of time?
    • If the M messages are sent over a lengthy period of time, the value of M may not matter much.
  • Is N a large number or a small number?
    • If N is small and the receiving application is not latency-sensitive, the value of M may not matter much, either.
  • How much other traffic is sharing the same network resources as these M messages?
    • Some network switches may handle congestion of lots of single-fragment messages better than congestion of lots of long multi-fragment messages.  Some switches do the opposite.  The issue may be compounded if the messages have to flow through multiple network switches.
  • How are the N bytes laid out in memory?  Are they contiguous, split into M discrete contiguous chunks, or scattered into more-than-M pieces (potentially randomly throughout memory)?
    • Sending from contiguous memory is (almost) always more efficient than sending from non-contiguous memory.  Sending M contiguous messages may be more efficient than sending 1 message that is comprised of bytes randomly strewn about your process’ memory space.
  • Are the N bytes the same N/M bytes, sent M times?  Or are they unique / distinct memory locations?
    • More specifically: are you sending the same memory location M times?  This would imply an iterative type of application; the memory is assumedly (potentially) changing each time you send the message.  In this case, efficiency is irrelevant — you have to send M messages, regardless.
  • Are the N bytes all available at the same time, or do parts of the N bytes become available over (a potentially lengthy period of) time?
    • For large values of N, it may be efficient to pipeline smaller sends as chunks of data become available.  For example, if the message is both long and computationally expensive to generate, you may want to send parts of it as they become available rather than sending the entire message once all of it is ready.
  • What is the receiving application doing, and how quickly will it process the incoming messages?
    • Put differently: what’s the synchronization between the sender and the receiver?  Tightly coupled applications tend to consume messages quickly.  Loosely coupled applications may run into resource exhaustion issues if many short messages are sent eagerly and not consumed, for example (the MPI standard says that this is not supposed to happen, but it can be quite difficult to implement this restriction — particularly at large scale — meaning that some MPI’s are not as good at this as others).

As you can see, you can take my seemingly-simple question to arbitrarily-complex depths.  This scenario reminds me of Richard Feynman’s PhD qualifier in physics; he was asked a simple question: “Why is the sky blue?”  The answer went on for hours.

It depends.

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.

1 Comments.


  1. 100K miles per year is 11.4 MPH. I heard that Dongarra’s yearly-averaged velocity is 35 MPH. I wonder what perks he gets from the airlines.

       0 likes