In mid 2012 just as UCS B-series blade servers were taking off in a big way, we noticed a group of our customers using our core technology very differently than customers in our primary market, enterprise IT. In our primary market customers loved UCS’s stateless computing model, virtualization benefits and the converged offerings with our partners EMC and NetApp. In this other category, customers did not consider those same benefits nearly as important. However UCS Manager’s powerful policy engine got them really excited. UCS Manager gave them a programmatic interface to manage thousands of nodes across dozens of sites globally.
Curious, I started to visit some of these customers. During one such visit, I was walking thru the aisles of their data center and I noticed something I had not ever seen at any of our enterprise IT customers data center. This customer had all UCS chassis single homed to a single Fabric Interconnect, I stopped in my tracks – really? Isn’t that kind of dangerous? What happens if there’s a failure? Or you have to upgrade? The customer explained to me how a combination of their application architecture and their application instance placement strategy made sure that outages at the rack level could be handled without service disruption. Wow! so we had engineered all kinds of resiliency, dual ported adapters, dual IOMs, dual chassis controllers, clustered Fabric Interconnects … lots and lots of hard engineering work to make our product robust and resilient, and this customer had thrown it all away with one toss… that really hurt. 🙁
Maybe the customer noticed the pain in my eyes but he spent the next hour telling me about the things he loved about UCS. He told me how seemingly simple activities like Bios settings, firmware management, Vlan renaming had been nightmarish in the past and how UCS made his life so … so much better. Eventually I realized that despite his wastage of our good engineering, there was something deep in the product that he absolutely valued.
Over the next couple of months we got better at asking the right questions. How do you measure the resiliency of your service? How do you relate your failure domain to service resiliency? How do you scale your service? What’s your application instance scaling unit? What’s your infrastructure scaling unit? What are your performance metrics? What are your TCO metrics? And, boy did we learn! Patterns started to emerge. The first most common pattern was that these customers were in the “internet business”. It didn’t matter that they were from traditional industries like financial services, travel, retail or emerging ones like service providers, online content, gaming, social media, SAAS and ecommerce – they were in the “internet business”.
These customers had to architect their services to handle scale and elasticity of demand never seen before within enterprises. It is much harder to predict and plan for success on the internet. Within an enterprise IT you know that if you are successful at best 100% of the employees will come to your service but that’s still a finite number. On the internet you have no such comfort 🙂 We also learned that these customers had substantial flexibility when it came to sizing their application instance. For instance, they could decide how many games, travel planning sessions; online banking sessions, shopping carts should be serviced by a single node? In short, services on the internet had to scale elastically; they were designed to be resilient and had good sizing flexibility. We started to call these services “Cloud Scale”.
Over the next few weeks, my colleagues and I would love to take you thru our journey from that painful “you wasted our engineering” moment to the product we just announced – UCS M-series Modular Servers. We will cover design philosophy, architectural choices, product details, suitable workloads, early customer feedback and TCO discussions. Next week, I’ll talk about the design philosophy that guided the product development.
Before I sign off, I just want to thank the dozen or so customers who guided us with great patience AND passion over the last 2 years. Our jobs are more delightful and fulfilling because of you. You deserve as much credit for this product as the product team here at Cisco.