FCOE. Can’t we all just get along?
I was sitting in a room with a client the other day and normally in these conference rooms with the mahogany tables and high back leather chairs*, you have Cisco on one side of the table, and the client on the other. However, this wasn’t the case, as the table was formica and the chairs were folding. Also, in the room was two groups that had never spoken before except in rare cases, “The network is down!” or “Our hosts can’t see their storage!” Yes my friends, it was the LAN and SAN folks in the room. The topic of FCoE was in front of us and the question was around their soon to be deployed Nexus 5000 switching infrastructure. The discussion between the two parties over who would manage the Nexus 5000 reminded me of a scene from Ghostbusters…
Peter: This city is headed for a disaster of biblical proportions.
Mayor: What do you mean, “biblical”?
Ray: What he means is Old Testament, Mr. Mayor, real wrath of God type stuff.
Ray: Fire and brimstone coming down from the skies! Rivers and seas boiling!
Egon: Forty years of darkness! Earthquakes, volcanoes…
Winston: The dead rising from the grave!
Peter: Human sacrifice, dogs and cats living together… mass hysteria!
As I watched the exchange between LAN and SAN folks, I realized, while they work 50ft from each other, they’ve never actually talked. I heard exchanges along the lines of:
What do you guys know about block based storage and zoning or InterVSAN routing? We do core-edge with large directors and y’know RAID is about disks, not killing roaches!
Routing? We invented routing. we don’t need you messing around with our routing tables, vPCs or QOS markers. We do core – distribution – access. Go back to your spinning iron record players…
After about 10 minutes of this, I think both teams wore themselves out and realized that they’d have to work together and, dare I say it, compromise on a solution that works for both of them. I knew the SAN folks were overworked and even mentioning the switch from 300 port directors to a top of rack design was making them age before my eyes.
I’m going to spare you with the lengthy debates about who was busier and cut to the chase. The topology was going to be a simple architecture with FCoE for host access and then connectivity to the MDS 9513 directors (“A” and “B” fabrics) would be over native FC and from a LAN standpoint, it would be the standard vPC connectivity:
The topology was simple, easy to digest, but the management aspect required a bit of indigestion discussion. To alleviate the anxiety of the SAN team having to manage dozens of more FC domains, they agreed that the switch should be put into NPV or N_Port Virtualization mode. This meant that the FC component of the switch would act just like a host. It would not run any switching protocols (zoning, FSPF, nameserver etc…) and would log into the upstream MDS 9513 as it it was a server.
Second, we would still configure the upstream FC ports into a Port-Channel, to provide redundancy and trunk whatever VSANs were required over those SAN Port-Channels which should not be confused with the Port-Channels that will connect to the upstream Nexus 7000 infrastructure.
So what does this leave us? Ah yes, since from a SAN standpoint, the Nexus 5000 running in NPV is pretty dumb, the only SAN related tasks left are:
- Creating and maintaining the upstream Nexus to MDS FC Port-Channel (one time, modified rarely)
- Configure QOS mappings for FCoE VLANs (one time configuration)
- Creating the VSANs (done rarely)
- Mapping the FCoE VLAN to the VSAN (again done rarely)
- Assign the VLANs to the ethernet interfaces connecting to the hosts (done for every host, but doesn’t really need two groups for this)
So, there’s not a whole lot that is done on a regular basis from a SAN perspective on the Nexus 5000 when it’s in NPV mode. No zoning, no device aliases, no CFS based features, and if you figure that while 100% of the devices that connect to the Nexus 5000 will be requiring LAN access, it would make sense for the SAN team to provide a written, clearly documented procedure for how the LAN team should perform the SAN related tasks on the Nexus 5000.
At this point I expected to be tossed out of the room, never to see the fine mahogany formica table again. But the SAN team was actually a bit relieved in a) realizing they’d have more time to focus on their core job which was managing storage arrays, backups, disk replication and the MDS SAN and b) what needed to be done on the Nexus wasn’t that complex to start with. They still had control of the storage environment as the intelligence in the SAN was remaining in the MDS 9500 core infrastructure. The SAN team was provided with access to the Nexus 5000s to aid with troubleshooting or assist with configuration if required.
This left one 800lb gorilla in the room (cue ominous music): Firmware Upgrades.
In light of this compromise to have the LAN team perform the daily Nexus 5000 configuration work, the SAN team agreed to the following: While the LAN team would drive the upgrade process based upon their requirements for more features, the SAN team would be given sufficient notice of a potential upgrade as part of the change management process and would be given ample time to
- Review any open caveats/bugs in the release-notes
- Test it out in their lab
- Provide written feedback to the combined team as to their findings or discoveries.
- “Veto” the firmware version if there were issues that would impact the SAN.
While there were no parades, high fives or celebratory actions, it did seem that both parties came away with a feeling that things seemed fair to start with and that refinement would be required…
I know what you folks are thinking out there, “What a load of low nitrate, bovine fertilizer!” Well, I didn’t prepare for this meeting with standards, CCIE skills or a debugger, I had 911 on speed dial and a couple of Gracie’s on standby. So, what are you folks seeing out there? Anybody had this conversation? Live to tell about?