Last week, here, I started my 2 part blog on some of the top SAN design and deployment challenges we see. As I mentioned, I put this together with help from my SAN expert colleagues, Barbara Ledda and Wolfgang Lang. We are all part of the Cisco Services professional services team, where we experience first-hand the challenges of adopting new technologies including SAN.
Last week, we discussed the following challenges:
#1 Don’t assume that your server multi-pathing software is installed or working, or even licensed, or installed but never used/tested by your server team!
#2 Tendency to significantly over-estimate utilization on the SAN network.
Lets’ now discuss our challenges #3 – #5, which will discuss interoperability, expertise and architectural details respectively. Architectural details as it turns out is a key concern of some of you reading Part 1 of this blog – a few questions came in and you can view the discussion here (and thanks to my colleagues Venkat Kirishnamurthyi and Jing Luo for contributing to this discussion). With this in mind, you may find the Cisco MDS architecture discussion video here also useful.
Read More »
Tags: architectural approach, architecture, cisco_services, data center networking, interoperability, Large Scale SAN Design, networking, SAN design, storage area networks
Look after your SAN experts!
One of the aspects I really enjoy about my job is that I get to learn from some of the world’s top network and data center design engineers, and I get to hear about technology adoption challenges across the world. If there is a complex network or data center design being worked by our customers, if our customers are under time pressure, or if our customers are facing key business or technical challenges, Cisco Services’ consultants are often called in to help. Globally then, they experience first hand the challenges of deploying advanced technologies. In this blog, in the same spirit as my OpenStack Deployment Challenges blog, I’d like to share their experiences on some of the most common challenges and misconceptions faced by our customers when building Storage Area Networks (SAN). I’ll publish this in 2 parts – so look out for the concluding part next week.
Before continuing, I’d like to thank two of our SAN expert consultants, Barbara Ledda and Wolfgang Lang, for sharing their experiences and challenges.
Read More »
Tags: architectural approach, architecture, cisco_services, data center networking, interoperability, Large Scale SAN Design, MDS, MDS SAN, networking, SAN design, storage area networks
MDS 9500 family has supported customers for more than a decade helping them through FC speed transitions from 1G, 2G, 4G, 8G and 8G advanced without forklift upgrades. But as we look in the future the MDS 9700 makes more sense for a lot of data center designs. Top four reasons for customers to upgrade are
- End of Support Milestones
- Storage Consolidation
- Improved Capabilities
- Foundation for Future Growth
So lets look at each in some detail.
- End of Support Milestones
MDS 4G parts are going End of Support on Feb 28th 2015. Impacted part numbers are DS-X9112, DS-X9124, DS-X9148. You can use the MDS 9500 Advance 8G Cards or MDS 9700 based design. Few advantages MDS 9700 offers over any other existing options are
a. Investment Protection – For any new Data Center design based on MDS 9700 will have much longer life than MDS 9500 product family. This will avoid EOL concerns or upgrades in near future. Thus any MDS 9700 based design will provide strong investment protection and will also ensure that the architecture is relevant for evolving data center needs for more than a decade.
b. EOL Planning – With MDS 9700 based design you control when you need to add any additional blades but with MDS 9500, you will have to either fill up the chassis within 6 months (End of life announcement to End of Sales) or leave the slots empty forever after End of Sale date.
c. Simplify Design – MDS 9700 will allow single skew, S/W version, consistent design across the whole fabric which will simplify the management. MDS 9700 massive performance allows for consolidation and thus reducing footprint and management burden.
d. Rich Feature Set – Finally as we will see later MDS provides host of features and capabilities above and beyond MDS 9500 and that enhancement list will continue to grow.
- Storage Consolidation
MDS 9700 provides unprecedented consolidation compared to the existing solutions in the industry. As an example with MDS 9710 customers can use the 16G Line Rate ports to support massively virtualized workload and consolidate the server install base. Secondly with 9148S as Top of Rack switch and MDS 9700 at Core, you can design massively scalable networks supporting consistent latency and 16G throughput independent of the number of links and traffic profile and will allow customers to Scale Up or Scale Out much more easily than legacy based designs or any other architecture in the industry.
Moreover as shown in figure above for customers with MDS 9500 based designs MDS 9710 offers higher number of line rate ports in smaller footprint and much more economical way to design SANs. It also enables consolidation with higher performance as well as much higher availability.
- Improved Capabilities
MDS 9700 design provides more enhanced capabilities above and beyond MDS 9500 and many more capabilities will be added in future. Some examples that are top of mind are detailed below
Availability: MDS 9700 based design improves the reliability due to enhancements on many fronts as well as simplifying the overall architecture and management.
- MDS 9710 introduced host of features to improve reliability like industry’s first N+1 Fabric redundancy, smaller failure domains and hardware based slow drain detection and recovery.
- Its well understood that reliability of any network comes from proper design, regular maintenance and support. It is imperative that Data Center is on the recommended releases and supported hardware. As an example data center outage where there are unsupported hardware or software version failure are exponentially more catastrophic as the time to fix those issues means new procurement and live insertion with no change management window. Cost of an outage in an Data Center is extremely high so it is important to keep the fabric upgraded and on the latest release with all supported components. Thus for new designs it makes sense that it is based on the latest MDS 9700 directors, as an example, rather than MDS 9513 Gen-2 line cards because they will fall of the support on Feb 28, 2015. Also a lot of times having different versions of the hardware and different software versions add complexity to the maintenance and upkeep and thus has a direct impact on the availability of the network as well as operational complexity.
With massive amounts of virtualization the user impact is much higher for any downtime or even performance degradation. Similarly with the data center consolidation and higher speeds available in the edge to core connectivity more and more host edge ports are connected through the same core switches and thus higher number of apps are dependent on consistent end to end performance to provide reliable user experience. MDS 9700 provides industries highest performance with 24Tbps switching capability. The Director class switch is based on Crossbar architecture with Central Arbitration and Virtual Output Queuing which ensures consistent line rate 16G throughput independent of the traffic profile with all 384 ports operating at 16G speeds and without using crutches like local switching (muck akin to emulating independent fixed fabric switches within a director), oversubscription (can cause intermittent performance issues) or bandwidth allocation.
MDS Directors are store and forward switches this is needed as it makes sure that corrupted frames are not traversing everywhere in the network and end devices don’t waste precious CPU cycles dealing with corrupted traffic. This additional latency hit is OK as it protects end devices and preserves integrity of the whole fabric. Since all the ports are line rate and customers don’t have to use local switching. This again adds a small latency but results in flexible scalable design which is resilient and doesn’t breakdown in future. These 2 basic design requirements result in a latency number that is slightly higher but results in scalable design and guarantees predictable performance in any traffic profile and provides much higher fabric resiliency .
Consistent Latency: For MDS directors latency is same for the 16G flow to when there are 384 16G flows going through the system. Crossbar based switch design, Central arbitration and Virtual Output Queuing guarantees that. Having a variable latency which goes from few us to a high number is extremely dangerous. So first thing you need to make sure is that director could provide consistent and predictable latency.
End to End latency: Performance of any application or solution is dependent on end to end latency. Just focusing on SAN fabric alone is myopic as major portion of the latency is contributed by end devices. As an example spinning targets latency is of the order of ms. In this design few us is orders of magnitude less and hence not even observable. With SSD the latency is of the order of 100 to 200 us. Assuming 150 us the contribution of SAN fabric for edge core is still less than 10%. Majority (90%) of the latency is end devices and saving couple of us in SAN Fabric will hardly impact the overall application performance but the architectural advantage of CRC based error drops and scalable fabric design will make provided reliable operations and scalable design.
For larger Enterprises scalability has been a challenge due to massive amount of host virtualization. As more and more VMs are logging into the fabric the requirement from the fabric to support higher flogins, Zones. Domains is increasing. MDS 9700 has industries highest scalability numbers as its powered by supervisor that has 4 times the memory and compute capability of the predecessor. This translates to support for higher scalability and at the same time provides room for future growth.
Foundation for Future Growth:
MDS 9700 provides a strong foundation to meet the performance and scalability needs for the Data Center requirements but the massive switching capability and compute and memory will cover your needs for more than a decade.
It will allow you to go to 32G FC speeds without forklift upgrade or changing Fabric Cards (rather you will need 3 more of the same Fabric card to get line rate throughput through all the 384 ports on MDS 9710 (and 192 on MDS 9706).
MDS 9700 allow customers to deploy 10G FCoE solution today and upgrade without forklift upgrade again to 40G FCoE.
MDS 9700 is again unique such that customers can mix and match FC and FCoE line cards any way they want without any limitations or constraints.
Most importantly customers don’t have to make FC vs FCoE decision. Whether you want to continue with FC and have plans for 32G FC or beyond or if you are looking to converge two networks into single network tomorrow or few years down the road MDS 9700 will provide consistent capabilities in both architectures.
In summary SAN Directors are critical element of any Data Center. Going back in time the basic reason for having a separate SAN was to provide unprecedented performance, reliability and high availability. Data Center design architecture has to keep up with the requirements of new generation of application, virtualization of even the highest performance apps like databases, new design requirements introduced by solutions like VDI, ever increasing Solid State drive usage, and device proliferation. At the same time when networks are getting increasingly complex the basic necessity is to simplify the configuration, provisioning, resource management and upkeep. These are exact design paradigms that MDS 9700 is designed to solve more elegantly than any existing solution.
Although I am biased in saying that but it seems that you have voted us with your acceptance. Please see some more details here.
Live as if you were to die tomorrow. Learn as if you were to live forever.
Tags: 16 Gigabit, 16Gb, 16Gb Fibre Channel, 9500, 9710, architecture, availability, best practices, Cisco, cloud, Cloud Computing, Consolidation, convergence, data center, Data Mobility Manager, DCNM, design, Director, dmm, FCIP, FCoE, Fibre Channel, Fibre Channel over Ethernet, IO accelerator, it-as-a-service, MDS, MDS design, nexus, NX-OS, reliability, SAN, Storage, storage area networks, switch, switching, Unified Data Center, Unified Fabric, upgrade, virtualization
This is the final part on the High Performance Data Center Design. We will look at how high performance, high availability and flexibility allows customers to scale up or scale out over time without any disruption to the existing infrastructure. MDS 9710 capabilities are field proved with the wide adoption and steep ramp within first year of the introduction. Some of the customer use cases regarding MDS 9710 are detailed here. Furthermore Cisco has not only established itself as a strong player in the SAN space with so many industry’s first innovations like VSAN, IVR, FCoE, Unified Ports that we introduced in last 12 years, but also has the leading market share in SAN.
Before we look at some architecture examples lets start with basic tenants any director class switch should support when it coms to scalability and supporting future customer needs
Design should be flexible to Scale Up (increase performance) or Scale Out (add more port)
The process should not be disruptive to the current installation for cabling, performance impact or downtime
The design principals like oversubscription ratio, latency, throughput predictability (as an example from host edge to core) shouldn’t be compromised at port level and fabric level
Lets take a scale out example, where customer wants to increase 16G ports down the road. For this example I have used a core edge design with 4 Edge MDS 9710 and 2 Core MDS 9710. There are 768 hosts at 8Gbps and 640 hosts running at 16Gbps connected to 4 edge MDS 9710 with total of 16 Tbps connectivity. With 8:1 oversubscription ratio from edge to core design requires 2 Tbps edge to core connectivity. The 2 core systems are connected to edge and targets using 128 target ports running at 16Gbps in each direction. The picture below shows the connectivity.
Down the road data center requires 188 more ports running at 16G. These 188 ports are added to the new edge director (or open slots in the existing directors) which is then connected to the core switches with 24 additional edge to core connections. This is repeated with 24 additional 16G targets ports. The fact that this scale up is not disruptive to existing infrastructure is extremely important. In any of the scale out or scale up cases there is minimal impact, if any, on existing chassis layout, data path, cabling, throughput, latency. As an example if customer doesn’t want to string additional cables between the core and edge directors then they can upgrade to higher speed cards (32G FC or 40G FCoE with BiDi ) and get double the bandwidth on the on the existing cable plant.
Lets look at another example where customer wants to scale up (i.e. increase the performance of the connections). Lets use a edge core edge design for this example. There are 6144 hosts running at 8Gbps distributed over 10 edge MDS 9710s resulting in a total of 49 Tbps edge bandwidth. Lets assume that this data center is using a oversubscription ratio of 16:1 from edge into the core. To satisfy that requirement administrator designed DC with 2 core switches 192 ports each running at 3Tbps. Lets assume at initial design customer connected 768 Storage Ports running at 8G.
Few years down the road customer may wants to add additional 6,144 8G ports and keep the same oversubscription ratios. This has to be implemented in non disruptive manner, without any performance degradation on the existing infrastructure (either in throughput or in latency) and without any constraints regarding protocol, optics and connectivity. In this scenario the host edge connectivity doubles and the edge to core bandwidth increases to 98G. Data Center admin have multiple options for addressing the increase core bandwidth to 6 Tbps. Data Center admin can choose to add more 16G ports (192 more ports to be precise) or preserve the cabling and use 32G connectivity for host edge to core and core to target edge connectivity on the same chassis. Data Center admin can as easily use the 40G FCoE at that time to meet the bandwidth needs in the core of the network without any forklift.
Or on the other hand customer may wants to upgrade to 16G connectivity on hosts and follow the same oversubscription ratios. . For 16G connectivity the host edge bandwidth increases to 98G and data center administrator has the same flexibility regarding protocol, cabling and speeds.
For either option the disruption is minimal. In real life there will be mix of requirements on the same fabric some scale out and some scale up. In those circumstances data center admins have the same flexibility and options. With chassis life of more than a decade it allows customers to upgrade to higher speeds when they need to without disruption and with maximum flexibility. The figure below shows how easily customers can Scale UP or Scale Out.
As these examples show Cisco MDS solution provides ability for customers to Scale Up or Scale out in flexible, non disruptive way.
“Good design doesn’t date. Bad design does.”
Tags: 16 Gigabit, 16Gb, 16Gb Fibre Channel, 9710, architecture, availability, best practices, Cisco, cloud, Cloud Computing, Consolidation, convergence, data center, Data Mobility Manager, DCNM, design, Director, dmm, FCIP, FCoE, Fibre Channel, Fibre Channel over Ethernet, IO accelerator, it-as-a-service, MDS, MDS design, nexus, NX-OS, reliability, SAN, Storage, storage area networks, switch, switching, Unified Data Center, Unified Fabric, virtualization
Note: This is the third of a three-part series on Next Generation Data Center Design with MDS 9700; learn how customers can deploy scalable SAN networks that allow them to Scale Up or Scale Out in a non disruptive way. Part 1 | Part 2 ]
This week is exciting, had opportunity to sit on round table with Cisco’s largest customers on an open ended architecture discussion and their take on past, present and future. More on that some other time let’s pick up last critical aspect of High Performance Data Center design namely flexibility. Customers need flexibility to adapt to changing requirements over time as well as to support diverse requirements of their users. Flexibility is not just about protocol, although protocol is very important aspect, but it is also about making sure customers have choice to design, grow and adapt their DC according to their needs. As an example if customers want to utilize the time to market advantage and ubiquity of Ethernet they can by adopt FCoE.
Moreover flexibility has to be complemented by seamless integration where customers can not only mix and match the architectures/protocols/speeds but also evolve from one to other over time with minimal disruption and without forklift upgrades. Investment protection of more than a decade on Cisco director switches allows customer to move to higher speeds, or adopt new protocols using the existing chassis and fabric cards. Finally any solution should allow scalability over time with minimal disruptions and common management model. As an example on MDS 9710 or MDS 9706 customers can choose to use 2/4/8 G FC, 4/8/16G FC, 10G FC or 10G FCoE at each hop.
Let’s review each aspect of flexibility at a time.
Cisco SAN product family is designed to support Architecture flexibility. From smallest to the largest customers and everything in-between. Customers can grow from 12 16G ports to 48 ports on a single 9148S. They can grow from 48 16G Line Rate Ports to 192 16G Line Rate with MDS 9710 and upto 384 ports on MDS 9710. Finally having seamless FC and FCoE capability allows customers to use these directors as edge or core switches . With the industry leading scalability numbers, customers can scale up or scale out as per their needs. Two examples show how customers can use Director class switches (9513, 9506, 9710 or 9706) based Architecture for End of Row designs. Similarly customers can orchestrate Top of Rack designs using Nexus fixed family or MDS 9148S.
If they want to continue with FC for foreseeable future or have sizable FC infrastructure that they want to leverage (and have option to go to FCOE) then MDS serves their needs. Similarly they can support edge core designs, and edge core edge designs or even collapsed cores if so desired.
If customers need converged switch then Nexus 2K, 5K and 6K provides the flexibility, ability to collapse two networks, simplify management as shown in the picture below.
Customers can mix and match the FC speeds 2G/4G/8G, 4G/8G/16G on the latest MDS 9148S, and MDS 9700 product family. With all the major optics supported, customers can pick and choose optics for the smallest distance to long distance CWDM and DWDM solutions in addition to SW, LW and ER optics choices. In addition MDS 9700 supports 10GE optics running 10G FC traffic for ease of implementing 10G DWDM solutions based on ubiquitous 10GE circuits.
FC is a dominant protocol with DC but at the same time a lot of customers are adopting FCoE to improve ROI, simplify the network or simply to have higher speeds and agility. Irrespective of the needs and timeline MDS solution allows customer to adopt FCoE today or down the road without forklift upgrades on the existing MDS 9700 platforms while leveraging the existing FC install base.
The diagram above shows how customers can collapse LAN and SAN networks on the edge into one network. The advantage of FEX include reduced TCO, simplified operations (Parent switch provides a single point of management and policy enforcement and Plug-and-play management includes auto-configuration).
Another example to allow non transition less disruptive for customers Cisco has supported the BiDi optics on the Nexus product family. This allows customers to use the the same same OM2, OM3 and OM4 fabrics for 40G FCoE connectivity and still don;t have to rip and replace cabling plant.
For customer who are not ready to converge networks but want to achieve faster time to market, higher performance, Ethernet scale economies can use separate LAN and SAN network and use FCoE for that dedicated SAN .
Coupled with broad Cisco product portfolio means that customers have the maximum flexibility to tune the architecture precisely to their needs. Cisco product portfolio is tightly integrated, all the SAN switches use same NxOS and DCNM provides seamless manageability across LAN, SAN, Converged infrastructure to Fabric Interconnects on UCS.
From the last 3 blogs lets quickly capture what are the unique characteristics of MDS 9700 that allows for High Performance Scalable Data Center Design.
24 Tbps Switching capacity, line rate 16g FC ports, No Oversubscription, local switching or bandwidth allocation.
Redundancy for every critical component in the chassis including Fabric Card. Data Resiliency with CRC check and Forward Error Correction. Multiple level of CRC checks, smaller failure domains.
In next few days lets put this all together to see how customers can deploy scalable networks that allow them to Scale Up or Scale Out in a non disruptive way.
To learn more about the MDS 9148S please join us for a webinar.
“In business, words are words; explanations are explanations, promises are promises, but only performance is reality.”
Harold S. Geneen
Tags: 16 Gigabit, 16G FC, 16Gb, 16Gb Fibre Channel, 192 Port, 9148S, 9706, 9710, architecture, availability, best practices, Cisco, cloud, Cloud Computing, Consolidation, convergence, data center, Data Mobility Manager, DCNM, design, Director, dmm, FCIP, FCoE, Fibre Channel, Fibre Channel over Ethernet, IO accelerator, it-as-a-service, MDS, MDS design, nexus, NX-OS, reliability, SAN, Storage, storage area networks, switch, switching, Unified Data Center, Unified Fabric, virtualization