Cisco Blogs


Cisco Blog > Data Center and Cloud

When Are FCoE Standards “Done?”

August 23, 2010
at 12:00 pm PST

If you think about it, there really hasn’t been any other technology that has raised the concerns about standards in the way that FCoE has. Suddenly, people are concerned about standards and how complete they are and don’t want to make a move until the standards are “done.”

 

You’d think people were afraid of salmonella poisoning if you use FCoE before it’s “cooked.”

 

So how can you tell? How do you know when standards are “done?”

 

Let’s face it, the standards bodies aren’t exactly up-front about this. It’s not that they’re not transparent; it’s just that they have their own terminology, processes, and jargon and people who just want to find out a simple answer to a simple question are left feeling like they just had a “simple” conversation with Alan Greenspan about macroeconomics. In other words, they’re left scratching their heads and wondering what just happened.

 

So, let me try to break things down into Plain English™. While the terminology varies depending upon the standards body, the process is pretty much identifiable according to 4 phases, as shown in the graphic below.

 

4 phases of standards process

 

It’s easy to get lost in the terminology. “Working Groups” and “Task Groups” are different things between 802.1 and T11. Different votes and ballots also mean different things. For our purposes here, though, the key thing to remember is that the processes are pretty consistent, and that the moment in time we are waiting for in terms of when the standard is technically complete happens when the development group stops arguing.

 

No, seriously. When the technical group completes its discussion of the proposals and consensus is achieved in Phase 2, the standard document is no longer in technical debate, and is considered stable. From this point it is safe for companies to create products based on that technical document.

 

The “Approval” phase provides the opportunities for participants to submit comments, change wording or text, but the technological aspects of the standard remain relatively  stable. 

 

When standards are "Done"

 

So as far as FCoE is concerned, where are the standards, as of this writing? 

 

Let’s try to redefine the question so that it’s more useful: What problems need solving and where are the standards in development with respect to solving that problem?

 

For FCoE there are two problems that you need to solve:

 

1) Placing Fibre Channel frames onto other media (e.g., Ethernet), and

2) Making the medium lossless

 

Those are the problems that need solving in order to run FCoE. There are other standards documents that address running FCoE with other protocols at the same time (aka Converged I/O), and those relate to:

 

3) Bandwidth management, or guaranteeing bandwidth for different types of traffic, and

4) Device configuration, or creating a means where devices can talk to each other and get their settings sync’d up

 

There are other documents within the DCB that address other problems, but those are unrelated to running FCoE. If you’re thinking about running FCoE in the Data Center and are concerned about standards, those are the problems that you’re concerned about solving. 

 

Fortunately, those problems have been addressed, respectively:

 

1) FC-BB-5 (also handles the technical problem of multi-hop FCoE!)

2) 802.1Qbb (Priority Flow Control, or PFC) 

3) 802.1Qaz (Enhanced Transmission Selection, or ETS)

4) 802.1Qaz (yes, the same document; Data Center Bridging eXchange, or DCBX)

 

Take a gander at the chart below and you can see that all of the standards documents that address issues relating to running FCoE with Converged I/O are, in fact, complete. All of them have been published, are in the publication phase, or about to go into the publication phase. 

 

State of the Standards

 

What this means is that with respect to FCoE, customers now should look at how the technology fits into their overall strategy from the perspective of value.

 

But if all you’re waiting for is the standards to be “done,” your wait is over.

 

 

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.

10 Comments.


  1. You left out QCN……please add that to your chart.

       0 likes

  2. J Metz

    Ronald – QCN isn’t relevant to FCoE. QCN operates on a single Layer 2 domain, while FCoE crosses L2 domains. I left out QCN and TRILL for that very reason (they don’t interact with FCoE) but I realized that there is so much confusion surrounding those two technologies that they deserve their own posts. Again, watch this space for updates (I can only write so fast!) :)J

       0 likes

  3. Seems to me that FCoE, by its very nature, sits within a single L2 domain. Not to say bridges and routers couldn’t be built, but I haven’t seen any announcements or any relevant standards work.Also seems to me that Congestion Notification exists because lossless Ethernet’s congestion behavior will be backpressure and congestion spreading (think a downtown traffic jam bordering on gridlock) very much like InfiniBand, rather than Ethernet’s historic packet drop / retry storm. Given that storage traffic tends to congest, this is very relevant.TRILL is one of many tools for growing an L2 domain or spreading it over multiple sites. VPLS has been around quite a while, and SPB is a quite viable technical competitor to TRILL. For the most part, it’s irrelevant to FCoE.

       0 likes

  4. J Metz

    Steve,Thanks for the comment. QCN and FCoE really do involve another post, because of the nature of the beast. It’s easier to see with pretty pictures, for instance.Effectively it’s about forwarding. When a FCoE packet is delivered to an FCF, the mac addresses are stripped off and then new MAC addresses are put in to forward to the next FCF/destination. This breaks the ability for QCN to send messages back to the source MAC address. As a result, QCN is useless and unrelated to how FCoE operates.You are correct about TRILL, SPB and FCoE, of course.

       0 likes

  5. Seems to me that Fibre Channel has both BB credits and EE credits for a reason. Agreed the flow control mechanisms at the Fibre Channel side terminate at the FCF as do the flow control mechanisms at the Ethernet side. This means that congestion and related performance anomalies can occur because of that boundary.But on the Ethernet side of an FCF, the PFC pauses take the role of BB credits. Without a complementary mechanism analogous to EE credits — the role Congestion Notification (QCN, etc) is supposed to fill — when congestion occurs it will spread and performance will collapse. Not every day, just at the moment Murphy’s law dictates.Now there are some people in the industry who are skeptical about CN consistently working at scale as designed by the standards committee. Does this mean you’re among the skeptics? Or do we really disagree about the need for an end to end flow control mechanism on the Ethernet side of the FCF?

       0 likes

  6. J Metz

    Steve, again, you raise some very good points, but at the heart it appears that you are conflating *link* congestion with *network* (e.g, end-to-end) congestion. Unlike most Ethernet traffic, FC (and, of course, FCoE) is an interlocked”” protocol. This means that packets aren’t sent unless an acknowledgement of space is available at the destination. If there isn’t room at the destination, the FCoE frames simply wait until their turn to go. They aren’t “”dropped”” unless a specific Time-Out Value (TOV) is reached, which is defaulted to 500 ms (transfer times are in the *micro*seconds, however, so we’re talking orders of magnitude before the TOV is reached). At this point it’s handled like a normal SCSI TOV.However, because of the fact that the packets sit-and-wait at an FCF, what good would congestion notification be for FCoE to go *beyond* that switch? The source no longer has knowledge of the packet. What would it hold on to? What would it do with a notification of congestion at that point? It’s already got PFC PAUSE notification, so there’s nothing additional that QCN would add to the picture.Your last question really handles the crux of the issue: QCN is a Layer 2 domain solution, whereas from an Ethernet perspective FCoE is a Layer 3 protocol (this is easier to show with a graphic – I recommend the FC-BB-5 document which shows this more clearly). As a result, FCoE is handled at a completely different layer and is indifferent to the underlying Layer 2 forwarding mechanisms (which is why, as you point out, TRILL is irrelevant as well).”

       0 likes

  7. Think we’re in agreement on most points, and perhaps will just have to agree to disagree on the rest. I do tend to take a systems view of congestion management and as a result view both link and end to end flow control as a single complex system, for which multiple engineering approaches exist. The form of FC’s credit scheme is very different from the form TCP packet drops take, but they are both solving the same problem. FC timeouts solve a different problem are are irrelevant to the flow control discussion.Agreed we will not achieve a single end to end flow control mechanism from an FCoE CNA through Ethernet, an FCF, FibreChannel, and into a disk array port. FCoE does not preserve the credit based (interlocked””) mechanism of FC, instead it relies on layer 2 for lossless delivery and flow control. So the FCoE (CEE) end to end terminates at the FCF, and the FC end to end originates there. The purpose of flow control on the FCoE side is not so much to allow a single stream to proceed through an idle path, as to prevent a bulk read (like a full table scan of a large database) from starving cross traffic in the intermediate switches. After all, the whole point of FCoE is that normal network traffic and storage traffic are coexisting on the same NICS, across the same cables, and in the same switches. CN is just the end to end flow control mechanism for CEE, preventing the InfiniBand style performance collapses that certain storage workloads cause in PFC only flow control on multi hop paths.Afraid I’m out of date on Fibre Channel: E_D_TOV defaulted to 2 seconds when we got started back in 1995 or so. Has it really been cranked down to a half second?”

       0 likes

  8. Steve, thanks for your comments. I will go through them in sequence.Seems to me that FCoE, by its very nature, sits within a single L2 domain. Not to say bridges and routers couldn’t be built, but I haven’t seen any announcements or any relevant standards work.””CDS> FCoE stays inside a single VLAN, however its forwarding is L3 forwarding. An ENode sends FCoE frames to the FCF with which has established a VN_Port to VF_Port virtual link (i.e., SA = FPMA(ENode), DA = FCF-MAC). Being destined to itself, the FCF processes the next layer (layer 3 from an Ethernet perspective, which is Fibre Channel in this case) and performs a forwarding decision based on the layer 3 address (the FC D_ID). Then the encapsulated FC frame is re-encapsulated in Ethernet to go to the next hop (e.g., SA = FCF-MAC1, DA = FCF-MAC2). QCN operates by sampling these frames and sending back a CN messags (a slow-down message) to the source address. But the source address is FCF-MAC1, not the effective source, which is instead what QCN expects. The MAC address of the ENode sourcing the FCoE frames is not anymore available in a FCoE frame after it crosses an FCF. This is why QCN does not work for FCoE.””But on the Ethernet side of an FCF, the PFC pauses take the role of BB credits. Without a complementary mechanism analogous to EE credits—the role Congestion Notification (QCN, etc) is supposed to fill—when congestion occurs it will spread and performance will collapse. Not every day, just at the moment Murphy’s law dictates.””CDS> EE credits exists in Fibre Channel only with Class 2. Class 2 is used today only by IBM mainframes, every other FC installation uses Class 3, which does not have any EE credits. Still, it works. The reason for that has nothing to do with congestion control, but it is due to the interlocked nature of SCSI traffic (i.e., a command is sent, buffers are allocated, transfer ready is sent back, the transfer can then begin and receiving buffers are there). FCoE does not change in any way the nature of SCSI traffic.””Afraid I’m out of date on Fibre Channel: E_D_TOV defaulted to 2 seconds when we got started back in 1995 or so. Has it really been cranked down to a half second?””CDS> No. E_D_TOV is 2 seconds. 500 ms is the recommended maximum bridge transit delay for a lossless Ethernet bridge supporting FCoE, also applicable to FCFs. That value comes from the current practice of Fibre Channel switches.I hope this helps. Claudio.”

       0 likes

  9. What is your stand on 802.1Qbg?

       0 likes

  10. Good to know that people are more aware that standards make a huge difference in any business, specially in technology where processes and connections are very relevant.

       0 likes