Understanding FCoE and TRILL, The Easy Way
Here we go again.
I’ve put it off, and put it off, and put it off, because every time I think about writing a piece about FCoE and TRILL I think to myself, “Okay, is this really something that enough people are going to care about to make a difference?” And then one day someone pipes up and brings up TRILL again, and thus the cycle begins anew.
I wonder if it’s related to the new zodiac signs or something.
Because I’m a person who likes to think in visual pictures, I’m going to include some pretty pictures here to help make sure I don’t miss anything along the way. I’ll also try to avoid some of the technical jargon and make it more approachable. Sometimes ya just gotta bring things back to basics.
What’s the Problem, Officer?
There are enough materials available on the web that a Google search will overwhelm you, so I won’t go too in-depth here, but the idea is that STP was designed to prevent switches that are connected together from creating loops.
As South Park’s Mr. Mackey might say, “Loops are bad, m’kay?”
I remember when I was living in England and trying to get to Stansted airport, which meant driving on the M25 (bear with me, it’s relevant). Apparently they were doing roadworks and so they diverted (detoured) the traffic. What was frustrating was that the people who were in charge of diverting the traffic managed to send me back around to exactly the place I started from.
I thought I had taken a wrong turn (user error, and all that), so I went around again. Sure enough, this wasn’t my fault, but rather the administration-in-charge had actually set up diversions that looped me around in circles.
This was frustrating beyond words (your duh moment of the day), and when the same thing happens in switches in your network the results can be catastrophic, with frames going around in circles, locking down your switches as the CPUs attempt to handle the traffic.
This is a Bad Thing™.
STP prevents this issue by blocking one of the links. While it solves one potential issue, it creates others.
In particular, there are two basic problems with STP. First, there can only be one active path, meaning that you waste the potential capacity of your network. Having full use of all your links is obviously a preferred method of running traffic over the network, and you don’t get that with a blocked link.
The second problem is that when one of the links goes down and STP needs to re-calculate paths, the time it takes can be pretty long.
Routing protocols – such as TRILL (for Ethernet) and Fabric Shortest Path First (FSPF, for Fibre Channel) – address these problems by providing mechanisms for creating multiple paths, and shorter (or no) time penalty for recalculation.
Close, but No Cigar
The common misconception is that because FCoE runs over Ethernet, you abso-positively-lutely need to have a better Ethernet forwarding in order to run FCoE. That is, if Ethernet is all screwed up, it’s going to screw up your FCoE traffic by default, right? Your FCoE traffic will suffer because of STP, right? That’s why you need TRILL, right?
Slow down there, killer.
Let’s not forget that Ethernet isn’t just simply a container for traffic that is thrown willy-nilly about with no organization. One of the greatest misunderstandings is that FCoE traffic somehow intermingles with Ethernet traffic, or that by PAUSE-ing one type of traffic it affects the others.
This isn’t how it works. We separate out our LAN traffic using VLANs, and we keep our FCoE traffic dedicated inside of a specific VLAN. That’s how we can apply Priority Flow Control (PFC) and other enhancements to Ethernet.
It also means that we can apply the appropriate forwarding (i.e., routing) mechanisms to the type of traffic it’s suited to. See, different protocols use different forwarding methods. Fibre Channel uses FSPF, which calculates the best path between switches (sound familiar?) and determines alternate routes in the event of a link failure or topology change. FSPF is designed to guarantee in-sequence delivery of frames, which is a requirement for FCoE, but not of traditional LAN traffic.
If you look at the pretty picture below, you’ll see that within a FCoE-capable switch, the LAN traffic has its own forwarding mechanism (STP), and the FCoE traffic has its own forwarding mechanism (FSPF). The separation of these traffic types onto different VLANs means that the storage traffic is independent of how LAN traffic is forwarded or blocked.
Because FCoE uses FSPF to determine its own forwarding and routing routes, FCoE has the ability to use all the links necessary without closing off for loops, and it avoids the issues of recalculation. Because FCoE has another system to do this, it doesn’t need TRILL.
This means that even if you have multiple traffic types running in a single Ethernet link, you can have different methods applying to the traffic types within that link. Now, of course this diagram doesn’t represent an actual network topology, as I’ve ignored the servers and storage that would normally be included.
The point here is to discuss what happens within switches, which is exactly where TRILL, STP, FSPF, etc., are designed to work. Critics might complain that I’m only showing three switches, but the principles here are just as relevant when dealing with more.
With a modern FCoE switch, you can have more than one way to forward traffic even if you are using common technology to transport that traffic. By allowing each side to play to its strengths, you do not need to have TRILL (an Ethernet forwarding mechanism) replace FSPF (a Fibre Channel forwarding mechanism).
Ultimately, the point here is to address the incorrect notion that you need TRILL in order to run multihop FCoE. Even when you start getting into more complex diagrams and topologies all you need is an appropriate routing mechanism to get from point A to point B, which FCoE has without requiring the use of new technologies such as TRILL.
In the future, if there’s enough interest, I will expand this to include some more technical discussions with some more practical examples of how networks with FCoE can be designed.
For now, though, the key thing to remember is that you do not “need” TRILL, it is not required, and you can safely implement multihop FCoE without worrying about TRILL anywhere in your system.