Sorry .. I did not mean to steal the title of Hillary Clinton’s book. It so happened that we had to deal with “hard choices” of our own, when we had to decide on the management approach to our new M-Series platform. In the first blog of the UCS M-Series Modular Servers journey series, Arnab briefly alluded to the value our customers placed on UCS Manager.As we started to have more customer conversations, we recognized a clear demarcation when it came to infrastructure management. There was a group of customers who just would not take any offering from us that is not managed by UCS Manager. On the other hand, a few customers who had built their own management framework were more enamored by the disaggregated server offering that we intended to build. For the second set of customers, there was a strong perception that UCS Manager did not add much value to their operations. We were faced with a very difficult choice of whether to release the platform with UCS Manager or provide standalone management. After multiple rounds of discussions, we made a conscious decision to launch M-Series as a UCS Manager managed platform only. Ironically enough, it was one such customer discussion that vindicated our decision. This happened to be a customer deploying large cloud scale applications and did not care much UCS Manager. During the conversation, they talked about some BIOS issues in their super large web farm that surfaced couple of years back. After almost 2 years, they were still rolling out the BIOS updates !
UCS Manager is the industry’s first tool to elegantly break down the operational silos in the datacenter by introducing a policy-based management of disparate infrastructure elements in the datacenter. This was made possible by the concept of Service Profiles, which made it easy for the rapid adoption of converged infrastructure. Service Profiles allowed the abstraction of all elements associated with a server’s identity and rendering the underlying servers pretty much stateless. This enabled rapid server re-purposing and workload mobility as well as made it easy for enforcing operational policies like firmware updates. And, the whole offering has been built on the foundation of XML APIs, which makes it extremely easy to integrate with other datacenter management, automation and orchestration tools. You can learn more about UCS Manager by clicking here.
UCS M-Series Modular Servers are the latest addition to the infrastructure that can be managed by UCS Manager. M-Series is targeted at cloud-scale applications, which will be deployed in 1000s, if not 10s of 1000s of nodes. Automation of policy enforcement is more paramount than the traditional datacenter deployments. Managing groups of compute elements as a single entity, fault aggregation, BIOS updates and firmware upgrades are a few key features of UCS Manager that kept surfacing repeatedly during multiple customer conversations. That was one of the primary drivers in our decision to release this platform with UCS Manager.
In the cloud-scale space, the need to almost instantaneously deploy lots of severs at a time is a critical requirement. Also, all of the nodes are pretty much deployed as identical compute elements. Standardization of configurations across all of the servers is very much needed. UCS Manager makes it extremely easy to create the service profile templates ahead of time (making use of the UCS Manager emulator) and create any number of service profile clones literally at the push of a button. Associating the service profiles with the underlying infrastructure is also done with a couple of clicks. Net-Net: you rack, stack, and cable once; re-provision and re-deploy to meet your workload needs without having to make any physical changes to your infrastructure.
Storage Profiles is the most notable enhancement to UCS Manager in order to support M-series. This feature allows our customers to slice and dice the SSDs in the M-Series chassis into smaller virtual disks. Each of these virtual disks is then served up as if they are local PCIe devices to the server nodes within the compute cartridges plugged into the chassis. Steve has explained that concept elaborately in the previous blog. In the next edition, we will go into more details about Storage Profiles and other pertinent UCS Manager features for the M-Series.
In the same year Cisco was founded, Kate Bush recorded the hypnotic Cloudbusting,one of her most iconic songs and music videos. Conceived by Terry Gilliam and featuring Donald Sutherland, there is a strikingly poignant moment in the video where Bush’s character is ‘cloudbusting’ with her father and she first realizes that adults are fallible.
Ready to savor tapas, Gaudi and the most vibrant community of IT professionals in the industry? You must be headed to Barcelona for Microsoft TechEd Europe, 28-31, October. Cisco will be there as well. We’ll be showcasing integrated solutions from Cisco and Microsoft for Windows Server 2003 migrations, cloud and SQL Server.
Cisco and Microsoft have worked closely to integrate Cisco UCS with Windows Server 2012 R2, Hyper-V and System Center 2012 R2, to provide the optimal platform for your Microsoft clouds and applications. Listen to what Microsoft Corporate Vice President Brad Anderson has to say about the Cisco and Microsoft relationship.
Make sure to stop by stand #207 to speak with a Cisco solution expert and take in a demo on: Read More »
This is the final part on the High Performance Data Center Design. We will look at how high performance, high availability and flexibility allows customers to scale up or scale out over time without any disruption to the existing infrastructure. MDS 9710 capabilities are field proved with the wide adoption and steep ramp within first year of the introduction. Some of the customer use cases regarding MDS 9710 are detailed here. Furthermore Cisco has not only established itself as a strong player in the SAN space with so many industry’s first innovations like VSAN, IVR, FCoE, Unified Ports that we introduced in last 12 years, but also has the leading market share in SAN.
Before we look at some architecture examples lets start with basic tenants any director class switch should support when it coms to scalability and supporting future customer needs
Design should be flexible to Scale Up (increase performance) or Scale Out (add more port)
The process should not be disruptive to the current installation for cabling, performance impact or downtime
The design principals like oversubscription ratio, latency, throughput predictability (as an example from host edge to core) shouldn’t be compromised at port level and fabric level
Lets take a scale out example, where customer wants to increase 16G ports down the road. For this example I have used a core edge design with 4 Edge MDS 9710 and 2 Core MDS 9710. There are 768 hosts at 8Gbps and 640 hosts running at 16Gbps connected to 4 edge MDS 9710 with total of 16 Tbps connectivity. With 8:1 oversubscription ratio from edge to core design requires 2 Tbps edge to core connectivity. The 2 core systems are connected to edge and targets using 128 target ports running at 16Gbps in each direction. The picture below shows the connectivity.
Down the road data center requires 188 more ports running at 16G. These 188 ports are added to the new edge director (or open slots in the existing directors) which is then connected to the core switches with 24 additional edge to core connections. This is repeated with 24 additional 16G targets ports. The fact that this scale up is not disruptive to existing infrastructure is extremely important. In any of the scale out or scale up cases there is minimal impact, if any, on existing chassis layout, data path, cabling, throughput, latency. As an example if customer doesn’t want to string additional cables between the core and edge directors then they can upgrade to higher speed cards (32G FC or 40G FCoE with BiDi ) and get double the bandwidth on the on the existing cable plant.
Lets look at another example where customer wants to scale up (i.e. increase the performance of the connections). Lets use a edge core edge design for this example. There are 6144 hosts running at 8Gbps distributed over 10 edge MDS 9710s resulting in a total of 49 Tbps edge bandwidth. Lets assume that this data center is using a oversubscription ratio of 16:1 from edge into the core. To satisfy that requirement administrator designed DC with 2 core switches 192 ports each running at 3Tbps. Lets assume at initial design customer connected 768 Storage Ports running at 8G.
Few years down the road customer may wants to add additional 6,144 8G ports and keep the same oversubscription ratios. This has to be implemented in non disruptive manner, without any performance degradation on the existing infrastructure (either in throughput or in latency) and without any constraints regarding protocol, optics and connectivity. In this scenario the host edge connectivity doubles and the edge to core bandwidth increases to 98G. Data Center admin have multiple options for addressing the increase core bandwidth to 6 Tbps. Data Center admin can choose to add more 16G ports (192 more ports to be precise) or preserve the cabling and use 32G connectivity for host edge to core and core to target edge connectivity on the same chassis. Data Center admin can as easily use the 40G FCoE at that time to meet the bandwidth needs in the core of the network without any forklift.
Or on the other hand customer may wants to upgrade to 16G connectivity on hosts and follow the same oversubscription ratios. . For 16G connectivity the host edge bandwidth increases to 98G and data center administrator has the same flexibility regarding protocol, cabling and speeds.
For either option the disruption is minimal. In real life there will be mix of requirements on the same fabric some scale out and some scale up. In those circumstances data center admins have the same flexibility and options. With chassis life of more than a decade it allows customers to upgrade to higher speeds when they need to without disruption and with maximum flexibility. The figure below shows how easily customers can Scale UP or Scale Out.
As these examples show Cisco MDS solution provides ability for customers to Scale Up or Scale out in flexible, non disruptive way.
“Good design doesn’t date. Bad design does.” Paul Rand
It almost feels like this blog entry should start with: Once upon a time…. Because it captures a journey of a young emerging technology and the powerful infrastructure tool it has become. The Cisco UCS journey starts with the tale of Unified Fabric and the Converged Network Adapter (CNA).
Most people think of Unified Fabric as the ability to put both Fiber Channel and Ethernet on the same wire between the server and the Fabric Interconnect or upstream FCoE switchs. That is part of the story, but that part is as simple as putting a Fiber Channel frame inside of an Ethernet frame. What is the magic that makes this happen at the server level? Doesn’t FCoE imply that the Operating System itself would have to know how to present a Fiber Channel device in software and then encapsulate and send the frame across the Ethernet port? Possibly, but that would require OS FCoE software support which would also require CPU overhead and require end users to qualify these new software drivers and compare the performance of software against existing hardware FC HBAs.
For UCS the key to the success of converged infrastructure was due greatly to the very first Converged Network Adapters that were released. These adapters presented existing PCIe Fiber Channel and Ethernet endpoints to the operating system. This required no new drivers or new qualification from the perspective of the operating system and users. However at the heart of this adapter was a Cisco ASIC that provided two key functions:
1.) Present the physical functions for existing PCIe devices to the operating system without the penalty of PCIe switching.
2.) Encapsulate Fiber Channel frames into an Ethernet frame as they are sent to the northbound switch.
Converged Network Adapter
It is the second function that we often focus on because that’s the cool networking portion that many of us at Cisco like to talk about. But how exactly do we convince the operating system that it is communicating with an Intel Dual port Ethernet NIC and a Dual port 4GB Qlogic Fiber Channel HBA? I mean these are the exact same drivers that we use for the actual Intel and Qlogic card, there’s got to be some magic there right?
Well, yes and no. Lets start with the no. Presenting different physical functions (PCIe endpoints) on a physical PCIe card is nothing new. It’s as simple as putting a PCIe switch between the bus and the endpoints. But like all switching technologies a PCIe switch incurs latency and it cannot encapsulate a FC frame into an Ethernet frame. So that’s where the magic comes into play. The original Converged Network Adapater contained a Cisco ASIC that sits on the PCIe bus between the Intel and Qlogic physical functions. From the operating system perspective the ASIC “looks” like a PCIe switch providing direct access to the the Ethernet and Fiber Channel endpoints, but in reality it has the ability to move I/O in and out of the physical functions without incurring the latency of a switch. The ASIC also provides a mechanism for encapsulating the FC Frames into a specific Ethernet frame type to provide FCoE connectivity upstream.
The pure beauty of this ASIC is that we have evolved it from the CNA to the Virtual Interface Card (VIC). These traditional CNAs have a limited number of Ethernet and FC ports available to they system (2 each) based on the chipsets installed on the card. The Cisco VIC provides a variety of vNICs and vHBAs to be created on the card. The VIC not only virtualizes the PCIe switch, it virtualizes the I/O endpoint.
Cisco Virtual Interface Card
So in essence what we have created with the Cisco ASIC, that drives the VIC, is a device that can provide a standard PCIe mechanism to present an end device directly to the operating system. This ASIC also provides a hardware mechanism designed to receive native I/O from the operating system and encapsulate and translate where necessary without the need for OS stack dependencies, for example native Fiber Channel encapsulated into Ethernet.
At the heart of the UCS M-Series servers is the System Link Technology. It is this specific component that provides access to the shared I/O resources in the chassis to the compute nodes. System Link Technology is the 3rd Generation technology behind the VIC and the 4th Generation technology for Unified Fabric within the construct of Unified Computing. The key function of the System Link Technology is the creation of a new PCIe physical function called the SCSI NIC (sNIC) that presents a virtual storage controller to the operating system and maps drive resources to a specific service profile within Cisco UCS.
System Link Technology
It is this innovative technology that provides a mechanism for each compute node within UCS M-Series to have it’s own specific virtual drive carved out of the available physical drives within the chassis. This is accomplished using standard PCIe and not MR-IOV. Therefore it does not require any special knowledge of a change in the PCIe frame format by the operating system.
For a more detailed look at System Link Technology in the M-Series check out the following white paper.
The important thing to remember is that hardware infrastructure is only part of the overall architectural design for UCS M-Series. The other component that is key to UCS is the ability to manage the virtual instantiations of the system components. In the next segment on UCS M-Series Mahesh will discuss how UCS Manager rounds out the architectural design.