Cisco Blogs

Understanding FCoE and TRILL, The Easy Way

February 11, 2011 - 4 Comments

Here we go again.

I’ve put it off, and put it off, and put it off, because every time I think about writing a piece about FCoE and TRILL I think to myself, “Okay, is this really something that enough people are going to care about to make a difference?” And then one day someone pipes up and brings up TRILL again, and thus the cycle begins anew.

I wonder if it’s related to the new zodiac signs or something.

Because I’m a person who likes to think in visual pictures, I’m going to include some pretty pictures here to help make sure I don’t miss anything along the way. I’ll also try to avoid some of the technical jargon and make it more approachable. Sometimes ya just gotta bring things back to basics.

What’s the Problem, Officer?

By this point you probably already know that TRILL is a working proposal to replace the Spanning Tree Protocol (STP).

There are enough materials available on the web that a Google search will overwhelm you, so I won’t go too in-depth here, but the idea is that STP was designed to prevent switches that are connected together from creating loops.

As South Park’s Mr. Mackey might say, “Loops are bad, m’kay?”

STP blocks a loop

I remember when I was living in England and trying to get to Stansted airport, which meant driving on the M25 (bear with me, it’s relevant). Apparently they were doing roadworks and so they diverted (detoured) the traffic. What was frustrating was that the people who were in charge of diverting the traffic managed to send me back around to exactly the place I started from.

I thought I had taken a wrong turn (user error, and all that), so I went around again. Sure enough, this wasn’t my fault, but rather the administration-in-charge had actually set up diversions that looped me around in circles.

This was frustrating beyond words (your duh moment of the day), and when the same thing happens in switches in your network the results can be catastrophic, with frames going around in circles, locking down your switches as the CPUs attempt to handle the traffic.

This is a Bad Thing™.

STP prevents this issue by blocking one of the links. While it solves one potential issue, it creates others.

In particular, there are two basic problems with STP. First, there can only be one active path, meaning that you waste the potential capacity of your network. Having full use of all your links is obviously a preferred method of running traffic over the network, and you don’t get that with a blocked link.

The second problem is that when one of the links goes down and STP needs to re-calculate paths, the time it takes can be pretty long.

Routing protocols – such as TRILL (for Ethernet) and Fabric Shortest Path First (FSPF, for Fibre Channel) – address these problems by providing mechanisms for creating multiple paths, and shorter (or no) time penalty for recalculation.

Close, but No Cigar

The common misconception is that because FCoE runs over Ethernet, you abso-positively-lutely need to have a better Ethernet forwarding in order to run FCoE. That is, if Ethernet is all screwed up, it’s going to screw up your FCoE traffic by default, right? Your FCoE traffic will suffer because of STP, right? That’s why you need TRILL, right?


Slow down there, killer.

Let’s not forget that Ethernet isn’t just simply a container for traffic that is thrown willy-nilly about with no organization. One of the greatest misunderstandings is that FCoE traffic somehow intermingles with Ethernet traffic, or that by PAUSE-ing one type of traffic it affects the others.

This isn’t how it works. We separate out our LAN traffic using VLANs, and we keep our FCoE traffic dedicated inside of a specific VLAN. That’s how we can apply Priority Flow Control (PFC) and other enhancements to Ethernet.

It also means that we can apply the appropriate forwarding (i.e., routing) mechanisms to the type of traffic it’s suited to. See, different protocols use different forwarding methods. Fibre Channel uses FSPF, which calculates the best path between switches (sound familiar?) and determines alternate routes in the event of a link failure or topology change. FSPF is designed to guarantee in-sequence delivery of frames, which is a requirement for FCoE, but not of traditional LAN traffic.

If you look at the pretty picture below, you’ll see that within a FCoE-capable switch, the LAN traffic has its own forwarding mechanism (STP), and the FCoE traffic has its own forwarding mechanism (FSPF). The separation of these traffic types onto different VLANs means that the storage traffic is independent of how LAN traffic is forwarded or blocked.

LAN and SAN separation even on same wire

Because FCoE uses FSPF to determine its own forwarding and routing routes, FCoE has the ability to use all the links necessary without closing off for loops, and it avoids the issues of recalculation. Because FCoE has another system to do this, it doesn’t need TRILL.

This means that even if you have multiple traffic types running in a single Ethernet link, you can have different methods applying to the traffic types within that link. Now, of course this diagram doesn’t represent an actual network topology, as I’ve ignored the servers and storage that would normally be included.

The point here is to discuss what happens within switches, which is exactly where TRILL, STP, FSPF, etc., are designed to work. Critics might complain that I’m only showing three switches, but the principles here are just as relevant when dealing with more.

Bottom Line

With a modern FCoE switch, you can have more than one way to forward traffic even if you are using common technology to transport that traffic. By allowing each side to play to its strengths, you do not need to have TRILL (an Ethernet forwarding mechanism) replace FSPF (a Fibre Channel forwarding mechanism).

Ultimately, the point here is to address the incorrect notion that you need TRILL in order to run multihop FCoE. Even when you start getting into more complex diagrams and topologies all you need is an appropriate routing mechanism to get from point A to point B, which FCoE has without requiring the use of new technologies such as TRILL.

In the future, if there’s enough interest, I will expand this to include some more technical discussions with some more practical examples of how networks with FCoE can be designed.

For now, though, the key thing to remember is that you do not “need” TRILL, it is not required, and you can safely implement multihop FCoE without worrying about TRILL anywhere in your system.

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.


  1. I’m dubious. While I believe that you think this is true, I wonder what is the exact use case that you attempting to describe here ? All the material I’ve researched does not explain or confirm your claim here. So, what have I missed ?

    I’d like to see some references to design material that explains the design and use case. Hard data to put some meat on these bones.

    • Believe it or not, part of what I was attempting to do was avoid some of those designs – for the time being. After I’ve written this article it appears that there has been some interest in learning more about the practical implementations of FCoE and TRILL-based L2 Ethernet routing (such as FabricPath), so I’ll be creating a more in-depth examination as soon as I can.

      Ultimately, my explanation here was intended to be generic, to be deliberate in showing that there are different forwarding mechanisms for the types of traffic. Once we understand that FCoE isn’t restricted by the types of problems TRILL is supposed to solve, the question can then turn (and now it can) to the use cases you ask for.

      Those use cases deserve their own focused post, however. Sadly I haven’t begun them yet, but will as soon as I can; I wanted to run this up the flagpole first to see if there was even an interest in going deeper.

  2. Hi Greg,

    Thanks for taking the time to reply to my little post.

    When referring to “we”, I was simply referring to b) “we” in the industry as its generally implemented. 🙂

    Re: BTW1 I’ll have to take a look at your PDF file. Thanks for passing it along.

    Re: BTW2 Thanks for raising the point about loops adding latency. You are, of course, absolutely correct and it is yet another reason why proper design for storage networks is paramount. One of the key elements of converging networks is that the basic assumptions for developing Ethernet networks are not always the same as the assumptions for storage network. Obviously, exploring the nature of proper design, using the proper technologies appropriate to the architecture, is beyond the scope of this blog article. It is, however, worthy of a much more focused piece.

    As far as the history of FSPF, others more familiar than I with its story should probably address that question. 🙂

  3. J,

    Can you clarify on a couple of points, specifically, in the following paragraph, you use the word “We”.


    Is “We” in the context of a) how Cisco implements within your products, or b) are you referring to “We” as in the industry implements or c) what is in approved or pending standards?

    You also mention FSPF, care to comment on what FSPF is based upon or very similar to (caution, may open old wounds for some).

    BTW1, came across this piece that looks back at what could be the next dimension to the FCoE/DCB/TRILL discussion which is PIM or protocol intermix mode.

    BTW2, whether loops are good or bad, they can add latency which is not a good thing for data and storage networks. Likewise hops are great in beer, however they can add latency to networks, not to mention slowing down those who consume too much of it as well.

    Cheers gs
    Greg Schulz – Author The Green and Virtual Data Center (CRC), Resilient Storage Networks (Elsevier) and coming summer 2011 Cloud and Virtual Data Storage Networking (CRC)

    twitter @storageio