Last week we announced the UCS M-series Modular Servers. The launch represented culmination of an exciting journey for us that started two years ago.
In mid 2012 just as UCS B-series blade servers were taking off in a big way, we noticed a group of our customers using our core technology very differently than customers in our primary market, enterprise IT. In our primary market customers loved UCS’s stateless computing model, virtualization benefits and the converged offerings with our partners EMC and NetApp. In this other category, customers did not consider those same benefits nearly as important. However UCS Manager’s powerful policy engine got them really excited. UCS Manager gave them a programmatic interface to manage thousands of nodes across dozens of sites globally.
Curious, I started to visit some of these customers. During one such visit, I was walking thru the aisles of their data center and I noticed something I had not ever seen at any of our enterprise IT customers data center. This customer had all UCS chassis single homed to a single Fabric Interconnect, I stopped in my tracks – really? Isn’t that kind of dangerous? What happens if there’s a failure? Or you have to upgrade? The customer explained to me how a combination of their application architecture and their application instance placement strategy made sure that outages at the rack level could be handled without service disruption. Wow! so we had engineered all kinds of resiliency, dual ported adapters, dual IOMs, dual chassis controllers, clustered Fabric Interconnects … lots and lots of hard engineering work to make our product robust and resilient, and this customer had thrown it all away with one toss… that really hurt. 🙁
Maybe the customer noticed the pain in my eyes but he spent the next hour telling me about the things he loved about UCS. He told me how seemingly simple activities like Bios settings, firmware management, Vlan renaming had been nightmarish in the past and how UCS made his life so … so much better. Eventually I realized that despite his wastage of our good engineering, there was something deep in the product that he absolutely valued.
Over the next couple of months we got better at asking the right questions. How do you measure the resiliency of your service? How do you relate your failure domain to service resiliency? How do you scale your service? What’s your application instance scaling unit? What’s your infrastructure scaling unit? What are your performance metrics? What are your TCO metrics? And, boy did we learn! Patterns started to emerge. The first most common pattern was that these customers were in the “internet business”. It didn’t matter that they were from traditional industries like financial services, travel, retail or emerging ones like service providers, online content, gaming, social media, SAAS and ecommerce – they were in the “internet business”.
These customers had to architect their services to handle scale and elasticity of demand never seen before within enterprises. It is much harder to predict and plan for success on the internet. Within an enterprise IT you know that if you are successful at best 100% of the employees will come to your service but that’s still a finite number. On the internet you have no such comfort 🙂 We also learned that these customers had substantial flexibility when it came to sizing their application instance. For instance, they could decide how many games, travel planning sessions; online banking sessions, shopping carts should be serviced by a single node? In short, services on the internet had to scale elastically; they were designed to be resilient and had good sizing flexibility. We started to call these services “Cloud Scale”.
Over the next few weeks, my colleagues and I would love to take you thru our journey from that painful “you wasted our engineering” moment to the product we just announced – UCS M-series Modular Servers. We will cover design philosophy, architectural choices, product details, suitable workloads, early customer feedback and TCO discussions. Next week, I’ll talk about the design philosophy that guided the product development.
Before I sign off, I just want to thank the dozen or so customers who guided us with great patience AND passion over the last 2 years. Our jobs are more delightful and fulfilling because of you. You deserve as much credit for this product as the product team here at Cisco.
This is a outstanding blog on the “Origin of the Species”.
It really explains how Cisco got to the M-Series and to be honest, the process closely aligns with ‘origin’ of UCS. Unified Computing System, B-Series blades, started with a basic understanding of what customers really needed, how they used their servers and operated in their data centers, and what was missing in the ‘offer de jour’ back in pre-2009, pre-UCS days.
M-Series is just another example of how Cisco’s thinking is in a real rut. Seems like the only thing we care about our customers and delivering “What They Need to Succeed”.
Great job !!!
Great article Arnab. Just to clarify a little something as I want to refer our audience to your entry here.
A cut and paste: ” We also learned that these customers had substantial flexibility when it came to sizing their application instance. For instance, they could decide how many games, travel planning sessions; online banking sessions, shopping carts should be serviced by a single node? ”
Is it correct to say that these customer NEEDED substantial flexibility…. And that they COULD NOT decide (or predict) how many games, travel planning sessions… etc.
Just want to make sure I am reading the sentiment correctly here.
Thanks!
Robb