Lies, damn lies, and statistics
I’m a fan of InsideHPC; I read it every day. I like John’s commentary; he does a great job of rounding up various newsworthy HPC-related articles. But that doesn’t always mean that I agree with every posted item. Case in point: I saw this article the other day, purportedly a primer on InfiniBand (referring to this HPCprojects article). I actually know a bit about IB; I used to work in the IB group at Cisco. Indeed, I’ve written a lot of OpenFabrics verbs-based code for MPI implementations.
There’s good information in that article, but also some fantastically unfounded and misleading marketing quotes:
- “With large data transfers, Ethernet consumes as much as 50 per cent of the CPU cycles; the average for InfiniBand is a loss of less than 10 to 20 per cent.” He’s referring to software TCP overhead, not Ethernet overhead. There’s an enormous difference — there’s plenty of Ethernet-based technologies that are in the 10-20% overhead range.
- “There are also power savings to be had, and this is critical when HPC facilities are confronting major issues with power supplies, cooling and costs. The same study indicates that InfiniBand cuts power costs considerably to finish the same number of Fluent jobs compared to Gigabit Ethernet; as cluster size increases, more power can be saved.” Wow. Other than generating warm fuzzies for customers (“My network products are green!”), what exactly does that paragraph mean? And how exactly was it quantified?
- …I’ll stop with just those 2. 🙂
These quotes are classic marketing spin to make IB products look the better than the competition.
More specifically, it’s easy to take pot shots at the competition and say why your product is the best. But this ignores that all networks are highly engineered solutions that have both benefits and tradeoffs. Sure, IB is great at some things. It’s also terrible at other things. The same can be said about Ethernet. And any other kind of network.
What customers should really do is to look at their target applications on these different kind of networks and analyze the networking engineering tradeoffs in that context. For example: does your application really need ~2us latency? Sure, ~2us latency is cool — but do you need it? Can you afford it? Would you lay off someone to pay for it? I’ve personally seen lots of applications that received no benefit when running on a faster/better network. On the other hand, I’ve personally seen lots of applications that received enormous benefits when running on an ulta low-latency network. All I’m saying is: “My network is better than yours” is a moot argument if your applications don’t care.
For example: if you have highly coordinated parallel HPC applications, then you might need the ~2us latency that Myrinet (which is Ethernet, BTW), IB and others can deliver. If your application is not so sensitive, then perhaps Open-MX’s ~10us latency on commodity 1Gb Ethernet is good enough. If your application continually sends a LOT of data around, perhaps you need the DDR 16Gb/s or QDR 20somethingGb/s bandwdith (but be sure to look at the congestion behavior of the network you plan to use!). If you don’t need the bandwidth, then perhaps that “free” network is all you need — and you can use more of your spend on RAM, or better processors, or more disk, or some other server-side improvement. Or perhaps the “simpler” administration of a unified network gives you “enough” bandwidth for your applications, but you save cost by not having to re-train all your networking admins. Shrug. It’s up to your local requirements. Look at them all and figure out what’s best for you.
To be clear: yes, the network is important (heck, I work for Cisco — we love networking!). But let’s not forget that it’s not all about the network. There are many other things that you need to look at in the context of your application that are just as important — if not more so — than the network. Some examples include: RAM, disk, IO, processor, cache size, and bus speed.
Avoid the marketing spin; make your purchasing analysis in the context of your local requirements to decide what is best for you, your IT environment, and your applications.
(/me steps off the soap box)