Balancing the MDS Scales in Your Favor

One of the main themes I’ve been running into a lot lately is the sense of scale. For a while the term actually lost much of its meaning because it has been used to describe any number of systems that happen to be large.

Scale-up.

Scale-out.

At scale.

See what I mean? The term is predominantly used as a synonym for growth, and while that has merit it does tend to gloss over some of the nuances of what happens when systems “get big.”

I’ve talked about this before, both in terms of different storage architectures (such as Spine/Leaf with Dynamic FCoE as well as SDN and Programmability, but that doesn’t mean that native Fibre Channel systems are being left out.

In fact, there have been some major improvements in the way that the MDS 9700 series of switches have been able to accommodate bigger scale. And by bigger, I mean by the thousands. This is, of course, just one of the major announcements that the MDS group has made, and I encourage you to read the blogs by Tony Antony and Prashant Jain about their new developments as well.

Raising the Stakes

It’s important to remember that Storage Area Networks (SANs) are designed to provide reliable storage connectivity to servers. Fibre Channel SANs are highly-planned to ensure that servers (hosts) have predictable performance when attempting to attach to their storage (targets).

What’s happened, though, is that the industry has laid a serious smack-down on the needs of storage environments. Virtualization – and the subsequent mobility of those virtualized machines – have combined with ever-increasing processing power to enable incredible feats of Data Center density.

Fibre Channel environments, however, work on the basis of fabrics. That is, a SAN is a fabric which “knows” everything about all of those hosts and all of those targets. Obviously the more hosts and the more targets, the more likely you are to reach the fabric’s limits.

Back in the day when one port meant one physical server, you could get away with relatively small limits. With virtualization, however, it becomes easier for larger Data Center environments to blow through those restrictions (granted, to be fair we’re talking very large systems here). With 10G FCoE and 16GFC, though, we are asking every port on the switches to handle more connections coming from virtual devices, so with the greater capabilities of the 9700 series MDS switches, there needs to be some give on these limitations.

Understanding What’s Important

Any Fibre Channel fabric – including FCoE fabrics, by the way – have certain metrics that designers need to keep in mind. To keep things simple, they are:

Zones – When you connect devices inside of a FC/FCoE SAN, you create a zone within which they can communicate. The reason why you do this is that you effectively cordon off devices from being able to communicate with one another when they shouldn’t. Storage arrays, for example, rarely need to communicate with each other in normal day-to-day operation (though they may during replication processes, etc.). Best practice is often to have a single host-target pair per zone, which can add up with virtual hosts and virtual targets being put onto the network.

Logins – When a host or a target connect to the SAN, it must login to the fabric. This is how the fabric keeps track of the storage network, and the number of logins is limited by how many you can achieve by fabric, switch, or line card. For example, you may be able to have 10,000 logins in a fabric, but you can’t load them up all on one line card on one switch!

Upping the Dosage

With the release of NX-OS 6.2.9 the MDS 9700 series is significantly raising the amount of limits you can have for each of these categories:

As you can see, the numbers are a significant boost to previous limits, but I think that the numbers here – while impressive – really fail to capture why this is such a big deal.

What’s It All Mean, Charlie?

I think of this as a chain reaction, mostly along the vein of what you don’t have to do. For example:

You don’t have to add in additional line cards even if you have ports open because of he login limitations
You don’t have to increase the number of switches because you’ve exhausted the number of zones
You don’t have to pay for the floor space for those additional switches, nor the cooling.

Or, think of this in another way. You’ve got extra capacity on each link (10G FCoE or 16G FC). Your applications are only pushing a steady state of, say for the sake of argument, 3-4Gbps, which leaves you a lot of extra bandwidth to play with on each link. So you virtualize. Then you virtualize some more. Soon, you’ve got more bandwidth than you need, but you can only put a small number of virtual machines to log into a line card or switch because of the login restrictions.

Think of blade server environments, and how many vHBAs you can create on a single system. Then, how many blade systems can attach to a single Director-Class MDS 9700. Suddenly the scaling limitations don’t seem quite so theoretical.

Bottom Line

We’re entering into some significant changes in Data Center activity, whether it be in architectures, topologies, or software. There’s no question in my mind, at least, that ensuring that the stars align is key to avoiding “gotchas.”

This is one of those moments where a few numbers on a table can have profound impacts on storage network designs, and major savings in actual implementation.

More Information:

You can find out additional information about the latest release of MDS hardware and software: