FCOE. Can’t we all just get along?

March 6, 2012 - 6 Comments

I was sitting in a room with a client the other day and normally in these conference rooms with the mahogany tables and high back leather chairs*, you have Cisco on one side of the table, and the client on the other. However, this wasn’t the case, as the table was formica and the chairs were folding.  Also, in the room was two groups that had never spoken before except in rare cases, “The network is down!” or “Our hosts can’t see their storage!”  Yes my friends, it was the LAN and SAN folks in the room.  The topic of FCoE was in front of us and the question was around their soon to be deployed Nexus 5000 switching infrastructure.    The discussion between the two parties over who would manage the Nexus 5000 reminded me of a scene from Ghostbusters…

Peter: This city is headed for a disaster of biblical proportions.

Mayor: What do you mean, “biblical”?

Ray: What he means is Old Testament, Mr. Mayor, real wrath of God type stuff.

Peter: Exactly.

Ray: Fire and brimstone coming down from the skies! Rivers and seas boiling!

Egon: Forty years of darkness! Earthquakes, volcanoes…

Winston: The dead rising from the grave!

Peter: Human sacrifice, dogs and cats living together… mass hysteria!

As I watched the exchange between LAN and SAN folks, I realized, while they work 50ft from each other, they’ve never actually talked.  I heard exchanges along the lines of:

What do you guys know about block based storage and zoning or InterVSAN routing? We do core-edge with large directors and y’know RAID is about disks, not killing roaches!

Routing? We invented routing. we don’t need you messing around with our routing tables, vPCs or QOS markers. We do core – distribution – access. Go back to your spinning iron record players…

After about 10 minutes of this, I think both teams wore themselves out and realized that they’d have to work together and, dare I say it, compromise on a solution that works for both of them. I knew the SAN folks were overworked and even mentioning the switch from 300 port directors to a top of rack design was making them age before my eyes.

I’m going to spare you with the lengthy debates about who was busier and cut to the chase.  The topology was going to be a simple architecture with FCoE for host access and then connectivity to the MDS 9513 directors (“A” and “B” fabrics) would be over native FC and from a LAN standpoint, it would be the standard vPC connectivity:

Sample Nexus 5000 FCoE Topology

The topology was simple, easy to digest, but the management aspect required a bit of indigestion discussion.  To alleviate the anxiety of the SAN team having to manage dozens of more FC domains, they agreed that the switch should be put into NPV or N_Port Virtualization mode.  This meant that the FC component of the switch would act just like a host. It would not run any switching protocols (zoning, FSPF, nameserver etc…) and would log into the upstream MDS 9513 as it it was a server.

Second, we would still configure the upstream FC ports into a Port-Channel, to provide redundancy and trunk whatever VSANs were required over those SAN Port-Channels which should not be confused with the Port-Channels that will connect to the upstream Nexus 7000 infrastructure.

So what does this leave us?  Ah yes, since from a SAN standpoint, the Nexus 5000 running in NPV is pretty dumb, the only SAN related tasks left are:

  1. Creating and maintaining the upstream Nexus to MDS FC Port-Channel (one time, modified rarely)
  2. Configure QOS mappings for FCoE VLANs (one time configuration)
  3. Creating the VSANs (done rarely)
  4. Mapping the FCoE VLAN to the VSAN (again done rarely)
  5. Assign the VLANs to the ethernet interfaces connecting to the hosts (done for every host, but doesn’t really need two groups for this)

So, there’s not a whole lot that is done on a regular basis from a SAN perspective on the Nexus 5000 when it’s in NPV mode.  No zoning, no device aliases, no CFS based features, and if you figure that while 100% of the devices that connect to the Nexus 5000 will be requiring LAN access, it would make sense for the SAN team to provide a written, clearly documented procedure for how the LAN team should perform the SAN related tasks on the Nexus 5000.

At this point I expected to be tossed out of the room, never to see the fine mahogany formica table again.    But the SAN team was actually a bit relieved in a) realizing they’d have more time to focus on their core job which was managing storage arrays, backups, disk replication and the MDS SAN and b) what needed to be done on the Nexus wasn’t that complex to start with.  They still had control of the storage environment as the intelligence in the SAN was remaining in the MDS 9500 core infrastructure.  The SAN team was provided with access to the Nexus 5000s to aid with troubleshooting or assist with configuration if required.

This left one 800lb gorilla in the room (cue ominous music):  Firmware Upgrades.

In light of this compromise to have the LAN team perform the daily Nexus 5000 configuration work, the SAN team agreed to the following:  While the LAN team would drive the upgrade process based upon their requirements for more features, the SAN team would be given sufficient notice of a potential upgrade as part of the change management process and would be given ample time to

  1. Review any open caveats/bugs in the release-notes
  2. Test it out in their lab
  3. Provide written feedback to the combined team as to their findings or discoveries.
  4. “Veto” the firmware version if there were issues that would impact the SAN.

While there were no parades, high fives or celebratory actions, it did seem that both parties came away with a feeling that things seemed fair to start with and that refinement would be required…

I know what you folks are thinking out there, “What a load of low nitrate, bovine fertilizer!” Well, I didn’t prepare for this meeting with standards, CCIE skills or a debugger, I had 911 on speed dial and a couple of Gracie’s on standby.  So, what are you folks seeing out there?  Anybody had this conversation? Live to tell about?

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.


  1. Thanks for bringing this up. Everything that you describe above and more is happening as enterprises try to take cost out of the infrastructure and FCoE is an enabler but it also requires some evolution of the organization.

    At Cisco Live (http://www.ciscolive.com/us/), we will be sharing more of those examples and Operational Models for FCoE Deployments. For those interested and attending Cisco Live US, please register for BRKSAN-2282 – Operational Models for FCoE Deployments – Best Practices and Examples.


  2. This is certainly a common discussion with the medium and large customers. Actually I would also say the firmware question is a gorilla in the room for the storage vendors too. I have seen many customers seeing a lot of benefit in going FCoETransit Switch (FIP Snooping Bridge) in the server rack and FCoEenablingtheir San rather than gateway at top of rack. This massively reduces the admin overlap. For the langue the fcoe enabled sans just another node on his network and for the san guy he only needs monitoring of the transit switch anyway.

  3. We just use RBAC with AAA, and give the storage guys the storage role, while the switch is “owned” by the network team. Network team members, depending on their level, get either network level config role (no SAN config access) or super-user.

    This is running in full FC switch mode.


  4. Hi Seth,

    Again, I totally agree with you. Sometimes it is just a matter of “infrastructure awareness” for customers to know what’s best and what’s not to get the most optimized network running.

    Great post!


  5. Very realistic scenario Seth. Something else which i have often noticed crop up among during discussions with customers about FCIP related deployment is compression. Many customers are under the impression that Cisco MDS have compression “on” by default, which, I think is not the case (please correct me if I am wrong or if things have changed and I have not kept myself up-to-dated). This sometimes causes them to turn compression ON (because they think thats the deafult but was not ON due to some “reason”) when they realise it is off thereby denying themselves compression benefits from whatever WAN/application acceleration equipments they may be using.

    Just my 0 cents worth
    Best Regards

    • Santanu,

      Thanks for stopping by. While having compression on is not required as many customers would rather compress the data at a WAN acceleration appliance (WAAS), enabling FCIP compression does give you the ability to reduce the number of GigE connections coming out of the MDS. Which while isn’t expensive since one FCIP license enables 4 GigE ports, the cost can be on the LAN switch side. Getting 2-3x compression is very common and if you can reduce the number of GigE connections from 4 to 2 and still maintain solid throughput, the savings in LAN equipment, while not large is a part of the solution. This is more apparent when the MDS is commonly connected into distribution layer switches like the Nexus 7000.