I spent two weeks over at the Ask the Expert forums, and I came to the realization that often our customers are bombarded with facts, figures, speeds, feeds, features, buzzwords, comparisons and functionalities for which they’re not sure which ones they must have while others they can live without or are a convenience. So I figured I’d toss out what I think are the top features for building an MDS Storage Area Network. Some may be obvious and others you might shake your head or light up the torches. They’re not in any particular order as your mileage varies from mine. I’ll probably skip those that are obvious like “hot swap power supplies” and other oh so exciting abilities…
The first set I usually refer to as the holy trinity of features as they constitute the foundation of the connectivity… VSANs, Port-Channels and TE Ports. They’ve been around literally forever on the platform and for good reason, they’ve been part of the hardware’s DNA since it’s inception. Additionally, if you walk down the hall to the folks that manage your LAN, you’ll find out that they’re using pretty much the same concepts and features as you (VLANs, Port/Ether-Channels and Trunking or 802.1q). So, if those guys are managing hundreds or thousands of switches and routers, there’s probably something worthwhile here. It’s also a pretty good chance that they are using them for the very same reasons that you are:
- VSANs: Isolation of fault domains.
- Port-Channels: High Availability and load-balancing of InterSwitch Links (ISL)
- TE_Ports: The ability to run multiple VSANs over the same ISL leveraging frames tagged with the VSAN ID and enforced in hardware.
Next on my list is NPV Mode aka N_Port Virtualization. I grew up in the era of 16 port SAN switches and like rabbits, they multiplied, and so did their domains, and don’t get me started on the upgrades… You had top of rack designs that involved dozens of small switches and this tsunami of small switches was slowed down by the emergence of the high density directors with hundreds of ports, first 128 then 256 now over 500. Lots of small switches met their demise..
However, this architecture came roaring back today, especially as we see companies deploy FCoE on with the Nexus 5000 at the edge/access. However, Cisco got smart and implemented NPV mode, which solved quite a few problems that were killing us back in the day. The NPV switch didn’t take up a domainID, thereby it didn’t maintain a copy of the zoneset, it didn’t process FSPF, nameserver or maintain E_Ports. It became for all intents and purposes, a pretty dumb device. Thereby solving the last headache: the upgrades. If the switch itself doesn’t require all the intelligence, there’s fewer reasons for you to upgrade it. Fewer features means fewer bugs. Fewer bugs means fewer upgrades. Fewer upgrades means fewer maintenance windows at 2am on a Saturday night. And I don’t think there’s anybody out there that relishes in the thought of doing an upgrade when compared with anything else you could be doing at 2am on a Saturday night.
Next on my list, I’d say Remote AAA, or Authentication, Authorization and Accounting. Though you might classify it as “store my user-account information somewhere else”. There was a time when you set up a switch and then had every user set their own password, or more commonly you just set some common admin account up on it and the SAN team used the same one everywhere. Leveraging the existing AAA infrastructure (TACACS+, Radius, LDAP or Active Directory) means one less thing that the SAN admin has to maintain and it enables you to easily grant access and determine what they can do from a central location
Since we’re on management, I’m going to add, Event Notification and Management. I’ve seen environments where one guy’s sole responsibility is to watch a screen of SAN switches and wait for “something to happen.” Always wondered, what happens if he’s at lunch and something happens…? Better be a short lunch. We’re really looking for the switch to detect a problem and then notify us or the maintainer to resolve the problem. My favorite example of this comes from the disk array companies. A drive fails in your disk array at 2am, the array phones home, opens up a case saying “disk drive has failed.” An engineer is dispatched within a few hours, they show up at your datacenter, replace the failed disk and business continues on as normal. You’ve enjoyed your full 8 hours of sleep and when you come in in the morning, you’ve got a nice email from the storage company telling you that they replaced a failed drive for you. So what features contribute to this? You can start with syslog, call-home emails (which can go to you or your service maintainer) or SNMP based notifications. Any of these messages can be processed automatically and then with a bit of intelligence in the receiving software take the appropriate action. The key part being that the switch informed you of a problem. Could be a failed fan or it could be that an ISL is running at 90% utilization.
Looking at my stack of cards here, next up at bat is Enhanced Zoning. This standards (FC-GS-4 and FC-SW-3) based zoning method solved two key problems with the most common activity in SAN management, both which can be considered career limiting moves. The first being, the ability for the switch (not your GUI) to provide you with a preview of what zones and zonesets you are about to put into production and then either go forward or abort the changes. The second being the elimination of the practice of having a “master zoning switch”. This was an arbitrary switch in which you performed all of your zoning related work on. Why? Because it supposedly held the only complete copy of your zoneset database, and if this switch went down, you were in a world of hurt. Enhanced zoning fixed this by guaranteeing that all switches in the fabric (or VSAN on the MDS) had an identical copy of the full zone database (this is the one you edit and then activate). Third, it also ensured that you never had both you and the SAN admin down the hall editing the zoneset at the same time. I know I mentioned two career limiting moves, but this ones a doozy. The ability to aquire a lock on the zoneset ensures that nobody is editing a stale zoneset database.
Since I’m on zoning, I think I’ll segue into Device-Aliases. The easiest way to explain these is to use a simple example from the LAN world. Imagine for a second, if you implemented DNS on a per subnet basis? So www.cisco.com could resolve to a different IP address if you were on subnet 192.168.0.0 than if you were on 172.22.0.0. So the networking folks decided to make DNS independent from the underlying infrastructure. This is where Device-Aliases improved upon the age old method of providing SAN pWWN to plaintext mapping over fcAliases. The problem with fcAliases is that they are tied to zoning. That’s like tying DNS to ftp applications only. What if you want to use something that crosses VSAN boundaries like IVR? Fcaliases don’t guarantee that pwwn1 has the same fcAlias defined in both or all VSANs. Or you could have host1 defined two different ways in two or more VSANs. Device-Aliases, working independent of zoning guarantees a 1:1 mapping of pwwn to “plain text name”. Thereby eliminating the shortcomings with fcAliases.
So here we are, not sure how to end this, but my editor is probably looking for me to write some flowing prose in the style of some 18th century Norwegian poet, I’ll just leave with this nugget. While your SAN fabric may have many requirements due to business drivers, there is a set of features for which every Cisco SAN Fabric should have. These are those foundation features for which we build our SANs upon. Sure there others such as performance monitoring or SAN Extension for which get built upon these features, but if you start with these and wrap them with solid processes, you’ll be much better off than most.
So which of these features do you use or can’t live without and which do you abhor? I’m always interested in what people are actually deploying. Until next time, watch out for the open floor tile…