STP is not the problem, but FabricPath will fix it!
Few years ago, in order to interact with the audience, I started a Cisco Live presentation involving some Spanning Tree design with three questions:
- Who hates the Spanning Tree Protocol (STP)?
This one is easy. You could sell ice blocks to an Eskimo based on the ubiquitous hatred for STP. Here, I got a good 90% of the hands in the air.
- Who has a good understanding of STP?
More personal question, but this is Cisco Live, with networking experts all over the place. Some 60-70% hands were raised.
- Who thinks that the root bridge can block a port?
Audience stunned! Some were shaking their head, with a negative expression, the others suddenly realized they had an urgent email to check or looked away. Among the more than 100 attendees, only one person in the front was frantically raising his hand. Too bad for him, there was no prize.
I drew two conclusions from this:
- First, giving the impression that you’re thinking your audience is made of idiots is not good for your session evaluation.
- Second: network engineers don’t realize how clueless they are about STP. Oh, they’re definitely not idiots. Take the CCIE exam and you’ll see that you’re expected to know the most intimate details of OSPF for example. They’re Layer 3 experts. STP is just taken for granted as something simple… and bad.
Layer 2 vs. Layer 3, a classic
When Layer 3 guys think of STP, they see RIP. Sure, 10% of STP is a RIP-like distance vector protocol, but the 90% remaining complexity is about preventing loops, even transient ones. Just to give you a feeling of the problem, we’ve seen critical network conditions caused by bridging loops lasting less than 100ms. It’s that bad, and it’s something Layer 3 guys just don’t see because this problem does not exist in their world. In their world, when there is a power outage, the lift just stops. In the Layer 2 world, the lift falls with a hissing sound and crashes in a cloud of smoke. Not a friendly world. Where does this difference come from?
When a router receives a packet, it looks in its forwarding table to determine where to forward it. If there is no prefix in the forwarding table, the router does not forward (i.e. it drops the packet). When a bridge receives a frame, it looks into its filtering database where not to send this frame. If there is no entry in this filtering database, also known as mac address table, the frame is not filtered (i.e it is flooded.) Yes, your high end switch behaves by default as a glorified hub and this has nothing to do with STP.
The consequence is well known. If for any reason a loop is introduced in the network, a single frame could be forwarded forever because there is no TTL at Layer 2. Worse, the powerful ASICs of your glorified hubs will flood the frame on all their ports, instantly saturating them. Then, just because you’re lucky, the frame is carrying a Layer 3 broadcast that is hitting directly and killing immediately the CPU of all the hosts. Hissing sound. Wait, it’s not over! You might have some low end switches at the access, with poor control plane protection. Their CPU might also be impacted by this traffic, and they might not be able to run their STP process any more. Interestingly enough, most people don’t realize that those edge switches, the further away from the root, are responsible for blocking ports in a bridged network. If they can’t run STP properly, they’re going to open even more loops… Cloud of smoke. Not only the servers are affected, but the condition of the failure can be maintained indefinitely. A local issue can have global, permanent impact.
Meanwhile, in the Layer 3 world, link state protocols like IS-IS or OSPF commonly introduce transient loops during their convergence. It does not matter. There’s no flooding at Layer 3, so a packet looping would only have local effect on the links part of the cycle and the TTL in the data plane would get rid of it eventually. Multidestination traffic is strictly constrained by a powerful reverse path forwarding check (RPFC) in the data plane again. Even if the CPU of a router was affected, adjacencies would drop, routes would be removed from forwarding tables and in the end, packets would stop being forwarded. The system would fall back to a stable state… The lift stops.
Would replacing STP by IS-IS help Layer 2? It would not – simply because the control plane was not the problem.
Indeed, the revolutionary change that FabricPath/TRILL introduces is not IS-IS, it’s a new the data plane. The mac-in-mac encapsulation reduces dramatically the number of mac addresses in the network. As a result, they can be advertised by the control protocol and used as entries in a real routing table, not a filtering database. If a destination is not known, traffic is dropped. The frame also features a TTL and the RPF check can apply to multidestination traffic. Frames are de facto routed within the network even if there is still bridging at the edge. Some people consider that routing frames is a heretic statement. I don’t know why “routing” would belong to Layer 3. We route planes, interrupts, why not frames?
What about Shortest Path Bridging?
One of the most significant differences between TRILL and the IEEE Shortest Path Bridging (SPB, 802.1aq) is precisely the data plane. The IEEE did not want to change the existing bridging data plane, as it implies new hardware. Remember that it’s because the Layer 2 data plane needs a tree that we have STP, not the other way around. As result, SPB still builds trees, with all the synchronization mechanisms that STP was hauling: we’ve replaced the 10% RIP with 10% IS-IS, but the 90% loop prevention complexity is still there (have fun reading clause 13.) Because the IEEE guys are not idiots either, they realized they had to do something and they split SPB in two flavors. The first, faithful to their pledge of maintaining the existing Layer 2 data plane is called SPB-V. From my perspective, the main enhancement SPB-V provides over STP is that its name does not include “Spanning” or “Tree”. I’m sure it will not be deployed anywhere but in a way, I’m sorry about it. It would have shown the world that the kind of problem I’ve described earlier was just as likely with IS-IS as it is with STP. The second flavor of 802.1aq is SPB-M. That one has a data plane closer to TRILL thanks to its use of the 802.1ah IEEE standard. The twist is that, because 802.1ah was just finalized, it allowed the IEEE to claim that SPB-M did not require new hardware. In reality, few data center switches, if any, are capable of supporting 802.1ah and running SPB-M means replacing your hardware for going half way to TRILL. At last, because it could not go further within the charter of 802.1aq, the IEEE recently initiated 802.1Qbp. This latter looks great and will finally introduce a new frame format with a TTL this time. The sad thing is that it will require yet another hardware change and will probably be available 10 years after TRILL was started with very similar goals…