Traffic. It’s a funny thing. On my daily drive to work, I see (what appear to be) oddities and contradictions frequently. For example, although the lanes on my side of the highway are running fast and clear, the other side is all jammed up. But a half mile later, the other side is running fast and clear, and my lanes have been reduced to half-speed. A short distance further, I’m zipping along again at 55mph (ahem).
Sometimes the reasons behind traffic congestion are obvious. For example, when you drive through a busy interchange, it’s easy to understand how lots of vehicles entering and exiting the roadway can force you to slow down. But sometimes the traffic flow issues are quite subtle; congestion may be caused by a non-obvious confluence of second- and third-order effects.
The parallels from highway traffic to networking are quite obvious, but the analogy can go much deeper when you consider that modern computational clusters span multiple different networks — we’re entering an era of Non-Uniform Network Architectures (NUNAs).
Networking teachers have long-since used the analogy of vehicles on highways to describe network traffic. There’s fun aspects of this analogy that can be quite apt, especially when extending to a NUNA type of environment:
- Large network packets can be considered 18-wheel tractor trailers.
- Small network packets can be considered “smartcars”.
- Dropped packets can be considered fender-benders — where the vehicles involved had to pull off the roadway and never got to where they were going.
- “Normal” networks, like 10G Ethernet, can be considered interstate highways. 1G Ethernet could be considered local/state highways.
- Host-specific networks (like QPI, hypertransport, and various host-side busses such as the various flavors of PCI) can be considered surface streets.
- All roads (surface streets and highways) have two directions; the traffic in one direction may or may not influence the traffic in the opposite direction on the same road.
- Messages originate and terminate at buildings on surface streets:
- Some messages are sent to buildings on the same street and arrive at their destination quickly.
- Other messages must travel across a few different surface streets (perhaps pausing at some intermediate stoplights) and may take a little longer.
- Still other messages must traverse some surface streets, get on the highway, exit onto more surface streets, and finally arrive at their destination. Such messages obviously take the most time to deliver, and are the most susceptible to intermediate traffic flow/congestion issues.
Putting all of these together, it may be a bit easier to visualize and think about a NUNA environment.
Only a year or two ago, commodity servers only had one or two “surface streets”. But as core counts keep increasing, it won’t be long before each server represents an entire “city” of surface streets — roads that must be traversed before data can even get to the inter-server highways that most people think of as “the network.”
Here’s a fun question: as core counts go up, what will happen to the interchanges between the surface streets and the interstate highways (i.e., NICs)? Will we have to build bigger NICs — and therefore bigger busses / internal networks to feed those Big NICs? Or will we simply need to put more NICs in a single server?
My money’s on the latter — it’s a heckuva lot easier to just put a few more NICs in a server than to revamp all the internal networks/busses, create new standards for expansion cards, and then get vendors to make NICs to those new internal specs.
But hey, I’ve been wrong before. We’ll see what happens!