To Tell the Truth: Multihop FCoE
In the old television show, “To Tell The Truth,” celebrity panelists would attempt to decipher the true identity of one of three people, two of which were imposters.
Sure, they looked alike, sounded alike, and had very similar characteristics, but there was only one genuine article.
As we move deeper into the capabilities of FCoE, it’s now starting to become obvious that people are starting to become more sophisticated about what happens beyond the first switch. Not surprising, we start getting a lot of possibilities that look and sound alike, but are not really bona fide multihop FCoE.
Let’s start off with making something as clear as I can: there is nothing inherently wrong with any of these multitiering solutions. The question is really only whether or not the solution is appropriate for your data center.
It’s a little long as blogs go so I apologize in advance, and if I get a little too geeky, at least pretty pictures are included. Grab a cup of coffee, but I promise by the end it will make much more sense. (For you FC people, this may seem a bit remedial at times, so please bear with me as I try to reach out to our non-storage brothers and sisters.)
What is Mutlitiering?
Put simply, for our purposes multitiering means that the data center is broken down into layers, and each layer is a tier. For example, we have the layer (often referred to “access” or “edge”) where the servers connect to the first switch, which makes one tier. In turn, those switches move back farther into the data center, into an “aggregation” or “core” layer, or “tier.”
Each layer has specific design considerations and have different roles to play in the data center. This is true whether we’re talking about ethernet or storage networks.
So, when we’re talking moving between these layers, or tiers, we are using mutttiering.
What gets confusing is that from both an Ethernet/LAN perspective and a Fibre Chanel/SAN perspective, we often refer to these switches as “hops.” The downside is that when you get into the nuts-and-bolts, they don’t actually mean exactly the same thing.
What is Multihop
While it would be valuable to go into all the possible meanings of what a “hop” is in data center terms, space and respect for your time prevents me from doing this (as it is, I may be pushing it a bit).
However, because FCoE is Fibre Channel, it behaves according to Fibre Channel rules to what a “hop” is. For that reason, there are two ways to determine if you have a FCoE “hop,” and what would constitute a “multihop” scenario.
First, in Fibre Channel a “hop” is what happens when you move switch-to-switch and Domain IDs change. If you’re not familiar with Fibre Channel, and are used to the Ethernet way of understanding hops, each FC switch (generally speaking, by default) is a single domain, which identifies the switch in a FC fabric.
Because FCoE is Fibre Channel, it behaves according to Fibre Channel rules as to what a “hop” is.
For the storage administrator, this is important because the Domain ID is critical in being able to provide proper security (e.g., zoning) and run Fibre Channel-related services to each device connected to it.
In FCoE, an actual Fibre Channel switch (called a Fibre Channel Forwarder, or FCF) exists inside of the Ethernet switch. This means that each FCF has its own Domain ID, enabling the storage admin to have control of the SAN wherever there is a FCF.
When your FC switches talk to each other, they are connected by an Inter-Switch Link (ISL), and each time you connect Domain IDs in FC, you have what’s called a “hop.” Therefore, in FCoE (which is FC), every time you connect Domain IDs, you have a “hop.”
The second way you can determine whether or not a switch falls into a “multihop” scenario is how much visibility the switch has into the FC portion of the payload for forwarding decisions. In other words, in order to keep the storage traffic engineering, an FCoE “multihop” switch needs to be able to maintain the appropriate forwarding mechanisms used by Fibre Channel (such as FSPF).
Generally speaking then, a “Multihop FCoE” switch is one that continues to subscribe to the appropriate FC traffic engineering, domain creation, and forwarding.
A Port in Any Storm
In Fibre Channel what makes something a hop is also determined by not just what is connected, but how.
At its most basic, a host/server uses a “Node Port”, or “N_Port” to connect to a fibre channel switch’s Fabric, or “F_Port.”
In the same way, an FCoE switch uses what’s called a “Virtual N_Port” ( “VN_Port”) connected to a “Virtual F_Port” (VF_Port) on a switch. (The reason why it’s called “Virtual” is because the physical port itself can be used for multiple purposes)
When two switches communicate, they use what’s called an “Expansion Port,” or “E_Port.” In FCoE – you guessed it – those ports are called “VE_Ports.”
It’s the presence of these “VE_Ports” that makes a “hop”, because it’s what makes an ISL between two Fibre Channel Domain IDs.
The issue for understanding comes from if you put something in-between the FCFs, which you can do when you place Ethernet switches in-between.
For Fibre Channel, an ISL is one of the ways it makes a “Hop.”
So, with respect to “To Tell The Truth,” let’s meet our contestants, each of whom claim: “I am a Multihop FCoE switch.”
Contestant #1: DCB Lossless Switch
From an Ethernet perspective, it “ain’t nuttin’ but a thang” to add on additional switches, which can help provide more access to hosts and servers. To build on our simplified models, it looks something like this:
Generally speaking, this type of design tunnels FCoE traffic through “SAN unaware” switches that have no knowledge or understanding of the packet type. In other words, no Fibre Channel protocol-related activity is applied to the traffic between the host and the FCF fabric.
Of course, this is perfectly “legal” according to the FC-BB-5 standard document. Hosts get access to FCoE storage, and if the FCF resides in a switch that can talk directly to FC storage, hosts can get access to those as well. One of the advantages of this is that it expands the fabric size without expanding Domain IDs.
From a storage/SAN perspective, though, there are some significant drawbacks, namely from a security perspective. Because the SAN admin has no visibility to what goes on in the DCB lossless switch, the design can become susceptible to “man-in-the middle” attacks, where a rogue server can pretend to be a FC (FCF) switch and insert itself into the FC fabric where it’s not wanted.
This method also prevents the ability to use FC forwarding or deterministic multipathing technology, and must rely on Ethernet Layer 2 solutions. This can further complicate troubleshooting and load-balancing in the storage network, especially if the number of DCB switches increase between the “VN_” and “VF_Ports.”
If you happen to be a storage admin, this approach completely prevents the standard practice of “SAN A/SAN B” separation for redundancy. For some vendors, the solution to this is to create an entirely separate Ethernet fabric, but LAN admins will tell you that isolating Ethernet traffic from half of the data center may not be a workable solution.
Really? Creating two Ethernet fabrics that can’t talk to each other in order to preserve storage SAN separations? Perhaps I should simply leave the possibility open for the sake of having options, but I have a hard time imagining this being put into practice.
Not surprisingly, as Cisco works to support SAN operations end-to-end in the Data Center, this type of solution is not something we recommend from a storage-centric perspective.
Contestant #2: FIP Snooping Bridges
Unlike the lossless DCB bridge, which has no knowledge whatsoever about the FCoE traffic flowing on its links, the FIP snooping bridge offers the ability to assist the FC fabric by helping with the login process (as servers come online, they need to “login” to the FC fabric).
While the switch doesn’t apply FC protocol services (such as multipathing, e.g.) to the traffic, it does inspect the packets and applies the routing policies to those frames. The FIP snooping bridge uses dynamic ACLs to enforce the FC rules within the DCB network. While a deep-dive is beyond the scope of this blog, let me point you to Joe Onisick’s fantastic exploration of the subject.
Generally, this improves upon the DCB lossless switch design because it prevents nodes from seeing or communicating with other nodes without first going through an FCF. The end result of this is that it enhances the FCoE security, preventing FCoE MAC spoofing, and creates natural FC point-to-point links within Ethernet LAN. One of the other advantages is that it expands the fabric size without expanding the Domain IDs.
Cisco’s Nexus 4000 Series switches work under this principle.
Now, this is where people often get confused about whether this is a “hop” or not – and let’s call a spade a spade: even Cisco has introduced this as a “multihop” environment, when in fact it’s a multitiering environment.
While it’s understandable on one level (after all, how many people have you heard try to introduce another concept – multitiering – in order to simplify a conversation!?), it really is misleading as it doesn’t address the definition of Fibre Channel ISLs, VE_Port to VE_Ports, or visibility into the FC payload. So, from a design perspective, it’s not actually multihop.
Why? Because the SAN admin doesn’t have total visibility into the fabric. Typically, Fibre Channel tools don’t see FIP snooping bridges, and FIP snooping bridges don’t track discovery attempts or login failures.
When a CNA failure occurs, admins must rely on CNA tools for troubleshooting, and there are potential load-balancing and SAN A/SAN B separation issues when these deployments start to scale.
All of this means that while FIP-snooping bridges can be a good idea for some designs, there may be other considerations at play. It also means that when you add FIP snooping bridges to your data center they are not FCoE “hops.”
Contestant #3: Multihop FCoE – NPV Switches
Now, even for those of you with some FCoE experience, FCoE NPV switches might be new. It’s an enhanced FCoE pass-through switch that acts like a server, performing multiple logins to the FCF Fabric:
(Actually, technically speaking the port facing the FCF is called a VNP_Port, but that’s not the point nor really important right now).
In this case, the switch behaves the same way that a server does, performing multiple logins into the fabric. The advantage here is that it provides load balancing, traffic engineering, while simultaneously maintaining FCoE security and the the Fibre Channel operational SAN model.
Moreover, it addresses FC Domain ID “sprawl,” which is something that larger deployments always have to contend with.
NPV is a technology that is used with great success in the Fibre Channel SAN world, and is quite popular when data centers grow. It allows very familiar management and troubleshooting to SAN admins and provides the same benefits as FC switches doing NPIV logins.
Not only does it not use a Domain ID, but it doesn’t lose visibility into the Fibre Channel domains and keeps the zoning infrastructure intact.
Overall, when compared to FIP-snooping, it offers much greater traffic engineering options for the storage administrator.
Contestant #4: Multihop FCoE – VE_Port ISL Switches
So, now we have our final contestant. Using the exact same model as Fibre Channel networks today, switches that communicate with each other as peers, using FCF-to-FCF (switch-to-switch) communication, meets all our requirements for a Fibre Channel “hop:”
Since we’re looking at this from a Fibre Channel perspective, it’s important to make some observations here.
First, this gives SAN admins the most control and most visibility into all aspects of SAN traffic.
Second, as you can see there is no extra “special sauce” needed in order to run multihop FCoE. You don’t need TRILL, or any other Ethernet Layer 2 technology.
Will the Real Multihop FCoE Please Stand Up?
There it is, plain as day, a complete storage solution, providing SAN A/B separation, fully standardized (and published!), and consistent with the existing models of storage networks that exist today. This makes it much easier to “bolt-on” FCoE technology into existing environments and maintain current best practices.
It’s important to note that I am not saying – nor have I ever said – that any one solution is better than any other. Because each of the various designs I’ve mentioned here are built using the same building blocks, you may find yourself in an environment where your traffic engineering needs mean a little of this, a little of that, a little of something else.
What’s key is that you understand what each of these terms mean. It should also help you when someone says that they have “multihop FCoE,” you are able to understand if they are talking from a storage perspective:
- Does it have “VE_Ports?” No? Then it doesn’t maintain consistency within well-understood FC Inter-switch storage links (ISLs).
- Does it have visibility into the FC payload to make routing/forwarding decisions? No? Then it misses the other criteria for making an FCoE hop.
- Does the switch make the traffic invisible to FC tools for troubleshooting purposes? Yes?Then it breaks the FC storage model.
- Does it provide SAN A/B separation while maintaining LAN coherence? No?Then it isn’t a truly converged network.
Again, depending on the purpose of the implementation, these may be desirable outcomes. But how can you possibly know unless you first understand what the differences between them are?
If you made it this far, congratulations! With luck this (extremely long) blog cleared up some of the confusion regarding “Multihop FCoE” and you have some better understanding of how to examine some of the products that are available that make the claim.