Cisco Blogs


Cisco Blog > Data Center and Cloud

Addressing the FCoE “Ready-For-Prime-Time” Question

October 20, 2011
at 7:33 am PST

My buddy Steve Foskett wrote a blog recently that talks about FCoE and 16Gb Fibre Channel. I want to say, for the record, that I like Steven, a lot, and normally I think he has a good grasp of the realities of new SAN technologies that emerge.

At the very least he has usually shown himself to be fair and balanced, even if not totally unbiased. In the many, many articles he has written I have never seen him knowingly write something to be untrue in his examination of technologies such as FCoE… until now.

For that reason, I can’t help but feel very disappointed.

The problem is that he writes in his first paragraph, “Why use a 10 Gb Ethernet standard that remains in flux when 16Gb FC is shaping up nicely?”

Thing is, Steven knows that the DCB standards have been finished for a while now, and that Multihop FCoE has been completed for years.

What he is most upset about, it appears, is that for FCoE storage there is “no interoperability”, and therefore more work needs to be done:

“FCoE is functional as an edge-only protocol, and is gaining traction in specialized use cases like blade servers. But end-to-end FCoE requires integrated Fibre Channel Forwarding and Ethernet fabric technology that remains decidedly experimental, and interoperability is a serious question.”

Steven apparently confuses several issues in his statements, which is most disappointing because I have sat down with him one-on-one and I know him to have a much greater understanding of where things are in terms of standards, vendor implementation, and what can be considered “prime time.”

I know he’s not the only one, as I’ve watched panels at SNW and have read reports from Interop where several people have conflated these questions into one big honking answer that doesn’t really help anyone make an informed decision -- not customers, not vendors, not analysts.

In reality there are four basic questions that need to be asked and answered individually in order to begin determining the status of FCoE development. None of them, however, address the crux of the issue: whether this technology is a tool that may be useful for solving customer problems or not.

Some people have made erroneous statements that simply are not true. For Steven (and several additional others who have decided to write about FCoE from positions of uncertainty or ignorance), here they are:

Question 1. When will Multihop Standards be done?

Statement 1. The Standards for Multihop are not done (either for FCoE or Ethernet)

Steven implies in his article that the 10Gb Ethernet standard “remains in flux.” He knows this is not true, because the key elements for running Consolidated I/O on 10Gb Ethernet have been completed for a while now. The key DCB standard documents, Priority Flow Control (PFC), Enhanced Transmission Selection (ETS), and Data Center Bridging eXchange (DCBX) have all been approved for inclusion in their final form into the 2011 revision of 802.1q. (Yes, that’s how it’s described -- the shorthand version, “it’s ratified” is not actually used by the standards committee).

(A fourth document, Quantized Congestion Notification (QCN), is often erroneously lumped into the mix for consolidated I/O, but as I’ve pointed out earlier this is a red herring and has nothing to do with FCoE or Converged Networks.)

Moreover, the standard for Multihop FCoE has been completed for several years now. Stephen also knows this and has spoken about it several times on his own blogs, podcasts, and public speaking engagements. He doesn’t specifically call it out in his most recent article but heavily implies this to be the case.

Question 2. When will vendors have Multihop FCoE?

Statement 2. Nobody has Multihop FCoE technologically ready/End-to-End FCoE isn’t ready for “Prime Time”

This is the most interesting one -- from my perspective, certainly -- because the statement is simply factually incorrect; on the other hand, the question may simply be a matter of not knowing what is available in the marketplace.

Cisco made Multihop FCoE available on the Nexus 5000 series in December, and Director-Class Multihop FCoE in August and September on the MDS 9500 and Nexus 7000 platforms, respectively. This is released code available now, and not sitting in some testing phase.

Steven knows this as well, as do several of Cisco’s competitors. While I can understand while someone would want to perpetuate this incorrect information as a competitor, throwing in all kinds of nonsense about needing QCN or TRILL, Steven’s insistence on stating that end-to-end FCoE not being ready for “prime time” is thoroughly disappointing.

With the standard being completed and the implementation tested, working, and available, it is not clear what “prime time” we are waiting for.

Could it be…

Question 3. Will FCoE interoperate between vendors?

Statement 3. There is no interoperability between vendors

On the face of it, this is simply not true. The Fibre Channel Industry Association regularly holds plug-fests with several switch, storage, and server vendors that consistently strive to improve interoperability with new equipment.

As just one example, take a look at the new Nexus B22 Fabric Extender for HP’s Blade Systems. Fitting into HP’s C-Class Blade System chassis, customers can connect interoperable FCoE between vendors. This is in addition to the rack-mounted solutions that have already been available.

I mean, come on! If HP and Cisco can do fully end-to-end FCoE then how can you claim it’s not interoperabe?!

(For the record, I wrote a joint paper on FCoE and Converged Management with HP’s Rupin Mohan, published in the FCIA Solutions Guide in time for SNW Fall 2011. If it becomes available on the web I will post a link to it in an update. Not working together… yeah right).

Ah, but the question has a sinister twist. The question comes into play if certain vendors will work with other vendors. That is, will Cisco FCoE work with Brocade’s FCoE?

Before answering, let’s take a look at the merit of the question on its face. First, this places an incredible burden of fitness on one vendor’s implementation (i.e., Brocade’s). It assumes that until Brocade offers interoperability, even if Cisco has the technology -- and now QLogic -- then it doesn’t count as “complete.”

Second, given that Brocade’s FCoE solution is not a standard implementation of Multihop FCoE -- take a look at the specifications, where it says that it requires the VCS technology in order to work -- does this mean that the entire industry hinges on one vendor’s willingness to interoperate?

Does this mean that end-to-end FCoE is only possible once Brocade catches up with everyone else?

I think not.

If that were the case, we would have to concede that end-to-end Fibre Channel implementations are not possible because Brocade actively prevents interoperability in that technology as well. Obviously, the mere suggestion of this is laughable.

So, I turn the question around. If one vendor has applied standard-based FCoE technology in its products, has released those products into the wild, and has created solutions based on that technology, should that vendor be accused of not having the solution just because “not everyone else” has it too?

Yeah, right.

Question 4. Who is using FCoE?

Statement 4. Nobody is using FCoE

This is not so much an argument as it is an excuse for not thinking.

It is, after all, the height of arrogance to assume that just because one does not see or hear of customers implementing end-to-end FCoE that no one is using end-to-end FCoE.

This is particularly curious as Multihop FCoE for Director-Class systems (the most common for Aggregation and Core deployments) has only been available for two months.

The other key thing to note is how often critics disqualify existing FCoE deployments as “not counting.” Apparently despite the fact that the technology is consistent regardless of where it is placed in the data center (same standards, same frame type, same protocol format), the only “valid” location for critics are the ones that have yet to be implemented by their favorite corner of the Data Center.

Truth is, I cannot reveal the customers who are working on end-to-end FCoE because, as is common in the industry, you need to have permission to do so. Fortunately we do have deployments that have gotten customers excited enough to want to participate, but these things take time to develop. And like everyone else, I will just have to wait until those Case Studies are completed. (Being patient can be very difficult for me, too.)

Ultimately, though, this is a non-argument.

Everyone’s data center is unique, and what may be a problem for one company may not be a problem for another. The fact that FCoE solves a problem for one customer in all likelihood will not be the reason why another decides to implement it.

After all, I’ve never -- ever -- said that FCoE was a panacea for the Data Center. In fact, the only people I have ever seen say that particular phrase are those who criticize FCoE vendors for saying it (in other words, they’re making it up in order to knock it down).

Let me put it this way: FCoE may be a technology to help your data center, or it may not. But how will you know unless you first understand enough about it in order to make an informed decision?

Looking at what your neighbor is doing will not suddenly change the unique nature of your data center, what solves their problems may not solve yours, and I get concerned when looking at customers who rely too heavily on peer pressure to solve their own complex issues.

Bottom Line

Generally, Steven knows that the standards are completed. So does every vendor out there who has skin in the game, despite what they may write on their blogs or (not-so-subtlely) guest-write for trade magazines.

They know that the technology is stable, and they know that there are solutions out there that are interoperable between vendors.

Most importantly, they know that the more customers are confused about the technology the more they are likely to second-guess themselves and misread the opportunity to solve problems they have in their data centers.

My disappointment with Steven’s post is that he knows all of this, we’ve spoken at length (in person) about it, and he has shown that he has a much better understanding of the issues at hand than his blogs have revealed.

(I mean, come-on! FC over Token Ring? Freakin’ hilarious!)

It’s understandable that the most prevalent (dare I say “best”) defense is to simply ignore or pretend that Cisco’s vision of Converged Networks doesn’t exist in reality. It doesn’t change the fact that it does, and that it’s here today.

Despite all claims to the contrary (notably by critics who have a vested interest in pushing their Mononetworks), additional features and products are on their way, interoperating with the 16G solutions and increasing the amount of choice for customers.

Cisco’s dedicated to giving more and more choices to customers to use the right tool for their Data Centers, without forcing them to choose. Moreover, we’re doing it with tools and technologies that are available today.

No one is asking Steve to endorse or promote one technology over another, nor am I pushing for him to support FCoE (or Cisco). That’s not his purpose, and not my place (I would never do it anyway).

It would be far better, however, that in these conversations we be more deliberate with the terminology. It’s important to help reduce any risk of confusion by providing accurate answers to the questions at hand.

Finally, an honest and genuine examine of the realities of converged networks would clearly show that end-to-end converged networking with FCoE is, in fact, “Ready for Prime Time.”

Tags: , ,

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.

11 Comments.


  1. The problem with FCoE is that we’ve spent all this time, money, and R&D into making FC work on Ethernet (or rather making Ethernet look like a FC network). I think all this work going into flatening layer 2 networks, lossless ethernet, and management has really complicated the Ethernet world.

    You can argue that Ethernet has needed these features for a long time and I would agree that Ethernet networks have and will continue to benefit from these additions.

    But in the end people are looking at the requirements for an FCoE network and then they look at their existing FC infrastructure or other protocolos like iSCSI and NFS (in some cases) which are not only around and widely deployed but also DESIGNED to work on a IP network and I think people are not seeing the value in FCoE.

    I think FCoEs ROI and longevity is something that is questionable. So the statements above are just a way to dodge the bullet and aren’t the true issue.

       0 likes

  2. J Metz

    Thanks for taking the time to read this article and comment.

    I believe you’re addressing a completely different question, however. Your comment about the requirements for FCoE does not jive with your immediately subsequent comment about ISCSI and NFS which are designed to work on IP. Since FCoE and IP are completely divergent solutions, they solve different traffic engineering needs.

    Nevertheless, the ROI/TCO conversation is a valid one, just not one that I was addressing here. In order to have *that* conversation you must first agree to the nature of the solution in the first place before determining it’s cost or return.

    J

       0 likes

  3. (A fourth document, Quantized Congestion Notification (QCN), is often erroneously lumped into the mix for consolidated I/O, but as I’ve pointed out earlier this is a red herring and has nothing to do with FCoE or Converged Networks.)

    That is your opinion, not a fact. And where opinions diverge.

    That is why many of us feel multihop isn’t ready. Nothing really to do with multihop, but the flow behavior multihop creates.

    Either you don’t understand the risk of multiple flows all coming into a shared path without end-to-end flow control or you just don’t agree that there is risk.

    As to others things that are making eng nervous

    Per priority pause frames are a _reative_ method. B2B credits are a _proactive_ method.

    Zones & A/B Fabric best practices are still evolving

    Is that to say FCoE is evil? No of course not. It’s coming along very well. It is however to say that FC people are a paranoid group. It takes time.

    Ultimately without QCN (which you don’t agree with I know) FCoE will be seen by many as not done. It’s an opinion. 

       0 likes

    • J Metz

      @Des:

      “Either you don’t understand the risk of multiple flows all coming into a shared path without end-to-end flow control or you just don’t agree that there is risk.”

      No. As I point out in the original article, QCN has particular applications that are orthogonal to FCoE implementations. The mechanism of QCN as it relates to the mechanism of FCoE is not a matter of opinion, it’s a matter of how FCoE is forwarded across an FCF.

      I understand the risk of multiple flows quite well, and I understand the risk. I also, however, understand that QCN is not the mechanism by which either of those can be alleviated in an FCoE environment. If that is an FCoE problem you wish to solve, QCN is not the solution you are looking for.

      I, at least, explained the mechanisms of how FCFs work and why QCN’s mechanisms do not address these issues. If you have the opinion that QCN *will* solve these problems, I welcome your description of the mechanics of how this may be the case.

         0 likes

      • Actually I’m very concerned CN will not fix the problem. I still hold hope it will at least reduce the risk to a more manageable level.

        All and all I’m rather disapointed in how much of DCB has come together. It certainly hasn’t been all smiles and roses.

        The reactive nature of PFC is a bummer for example. I much prefer buffer to buffer credits with r_rdys. I like proactive management.

        Here is the end game for me. I don’t work in a lab. I’m one of those who can be woken up at 3am to drive into a Datacenter to fix things when risk becomes reality. One of those who have to write the post mortem explaining what choices were made to create the situation that allowed the failure, and what we are going to do to ensure it won’t happen again.

        As a result I’m very paranoid. I don’t like risk. I like to sleep.

           0 likes

        • Hi Des, J’s point about CN not working across an FCF is a valid one. I would also point out that CN could theoretically work across a network of FIP Snooping Bridges (a.k.a. Transit switches) but my other two points would still need to be dealt with.

          In regards to PFC’s reactive nature versus BB_Credit’s proactive nature, I agree that PFC has an issue today when the actual physical link length exceeds the link length supported by a particular implementation. However, this is only an issue if you are using some kind of link extension device as all of the current implementations assume the maximum distance allowed by the physical media in use.

          During the last T11 meeting, Brocade brought this issue up for consideration and I agree with the idea that some method for dynamically determining the link length should be defined. If this is done, PFC will effectively be as robust as BB_Credit flow control in variable distance applications.

          This isn’t to say that BB_Credit is perfect. As I explain in the “BB_Credit loss” section of Networked Storage Concepts and Protocols (http://www.emc.com/collateral/hardware/technical-documentation/h4331-networked-storage-cncpts-prtcls-sol-gde.pdf) BB_Credit flow control is very sensitive to bit errors. There are ways to mitigate this sensitivity through the use of BB_Credit recovery but this mechanism is not yet widely deployed.

             0 likes

          • Hey Erik,

            Sorry for the delay, was a rather long doc.

            But as someone who used to teach fibre channel protocol classes, it’s a very nice doc.

            In your section BB_Credit you describe how such a sensitivity can impact performance. I’m fine with that. What I’m concerned with is data integrity.

            You then talk about long distance which isn’t relavent here as PFC is not meant for such things.

            So then if you (and Mr Metz) agree that PFC has issues and agree there is risk, agree that multiple flows from multiple initiators converging on an uplink has risk, and you (if not Mr Metz) agree reactive would be better then when do you think FCoE will be safe? I don’t mean safe as in perfectly flawless, but safe as in as safe as modern fibre channel is today?

            Would you for example run a fighter crafts weapon release system over FCoE? Would you tell Visa that all credit card authorizations can now rely on end to end FCoE with no more risk than exists today with their FC environment?

            I suppose one needs to quantify “ready for prime time”.

            I’m all for server to switch FCoE. QA, maybe build environments. Maybe However as things are today, putting my job on the line or putting someone’s credit, or sometimes life on the line with end to end FCoE today is much more Risk than Reward. I assume in say 5 yrs this will dramatically improve.

               0 likes

          • Hi Des, thanks! Writing it has been a lot of fun / hard work, so it’s nice to hear when someone appreciates it…

            I agree that FCoE is not currently intended to be used over distance. My point is even BB_Credit has known issues that have taken many customers down over the past 10 years and as an industry we are only now addressing them.

            If you compare the risk of running FC over distance without a BB_Credit recovery mechanism versus the risk of using PFC over shortwave distances without a mechanism for automatically detecting the actual link length, I’ll take PFC over BB_Credit all day long.

            A VERY important point to note is that in either case, there is no additional risk of data corruption. With FC and bit errors you will have a performance roll off until you effectively cannot pass data between the two sites. With PFC, at worst, if you overrun the rx buffer and you happen to lose the first frame in an exchange, you’ll need to wait 30 – 60 seconds for the ULP to re-drive the I/O.

            In regards to actual applications, I don’t know anything about the requirements for fighter jet aircraft weapons release systems and as a result I have a hard time speculating on what is/is not appropriate for such an environment. That having been said, I do know a thing or two about the requirements for a credit card environment and I have recommended FCoE for such an environment in the past and I would do so again without hesitation.

            Your concerns about CN have been answered in two ways:
            1. As J pointed out, CN won’t work across multiple FCFs and in my opinion should never have been made a “requirement” for FCoE.
            2. Even if it did work across FCFs, you do not need it any more or less than it is needed with traditional FC.

            I have also been following Stephen and J’s discussion on the meaning of “ready for prime time”. If you google the phrase “not ready for prime time”, you will find a number of definitions that contain an amount of variation sufficient for us to conclude that the term has absolutely no value in a technical discussion. This is the problem with using cliché in technical conversations; it rarely leads to anything productive. This leads to my next point…

            The topic of risk versus reward was also discussed by Stephen and J. My opinion is no technology is a perfect fit for every application / environment. If you want to avoid as much risk in your production environment as possible, then you need to do your own due diligence. In an extreme case, this could include setting up a large test environment and testing each technology in a configuration that represents your work flows, applications, data sets, topology and effectively qualifying everything as a system. However, since this type of qualification can be prohibitively expensive and since the number of problems found with FC on a release by release basis are approaching zero, more and more customers are choosing to rely on vendors such as EMC / E-Lab to perform this type of work (BTW, we are VERY happy to help minimize the amount of risk our customers are exposed to). However, the problem with relying on a third party is it makes it more difficult to evaluate a new technology such as FCoE and maximize potential reward. I mean if I were in this position, one of the only sources of information available to me would what is publicly being said about a given technology by people who know supposedly know it best. This is where my HUGE problem with the “not ready for prime time” *position* begins because no technical reasons are provided in support of it. In fact if you listen to FCoEs biggest detractors, the only argument they consistently make is that “nobody is doing it”. I think if there was an actual TECHNICAL argument to be made against FCoE, it would have been made and made LOUDLY by one company in particular, yet here we are talking about how many people have deployed.

            In terms of rewards with FCoE, I can create a converged topology that (at least on paper) costs half as much to purchase, uses only 70% of the power and provides more bandwidth than a non-converged topology (assuming 8G FC). I think this is a good enough reason to at least look at the technology.

            All of this having been said, I don’t agree with any of the nonsense I’ve read recently that states XaaS / Cloud based technologies are going to make FC completely obsolete in 5 years. This just doesn’t jive with any technology adoption trends that we have ever observed. I mean look at the number of applications deployed on a VM versus a stand-alone server and you’ll see an example of what I mean. My point is that if you are an FC Customer and you need additional bandwidth, you shouldn’t feel pressured to choose an ethernet based technology (iSCSI or FCoE) in order to future-proof your environment. You can confidently roll forward with plans to move to 16G FC knowing that it will be around for a while.

               0 likes

  4. Before I respond in detail, let me clarify regarding the “or (not-so-subtlely) guest-write for trade magazines” statement: IBM had nothing at all to do with that post. I wrote it entirely on my own with no input or editing by The Storage Community, let alone IBM, which sponsors that site. In no way should anyone misconstrue that post as representative of IBM’s opinion or strategy.

    Also, let me clarify that I wrote that piece before our conversation at SNW.

       0 likes

  5. Des, I agree with J’s opinion regarding the need (or lack thereof) for Congestion Notification. Following the same logic, you could also say it’s only our opinion that PFC or ETS are needed… The bottom line is that at some point the behaviors we observe during testing and general use for that matter allow us to determine what’s required and what’s not.

    We require PFC and ETS becuase not having them negatively impacts system performance. I can demonstate this in the lab all day long..

    CN on the other hand has always been a nice to have feature (at least theoretically) but ultimately it has been proven to be unnecessary, just as the same feature was proven to be unecessary in FC Fabrics… I can also think of a couple of technical reasons why CN doesn’t work so well with block I/O in practice.

    The first has to do with the bursty nature of block I/O and the second is how to handle the case where you have n ports all transmitting at a rate of bandwidth/n. In the first case, by the time you detect the “hot source” it will probably not be the “hot source” any longer and in the second case, which flow can you stomp on and still be fair?

    In the end, all of this is still just my opinion but it is mostly based on direct experience…

       0 likes

  6. I agree that CN has issues.

    However something needs to handle end to end. Otherwise in multi hop environments with large fan in/fan out ratios a single per priority pause can impact many many flows.

    I’d rather fix CN than just cross my fingers and hope for the best

       0 likes