The Value of Orchestration: What Did Captain Kirk Know That Scotty Didn’t? & The Roach Motel Infrastructure Issue

November 7, 2011 - 0 Comments

Recently, a customer asked me what was the value of using automation to operate a private cloud?  It was a good question. Working  in the middle of the reality distorition field of the cloud industry I take it for granted that everyone knows automation’s benefits.

Fundamentally, automation tools help to reduce labor costs, rationalize  consumption and increase utilization.

Costs are lower because the labor required to configure and deploy is eliminate. This automation is possible by creating standard infrastructure offerings. Standard infrastructure offering make possible a new operational model: to move from the artesanal approach of delivering infrastructure ,where every system and configuration is uniqe, to the industrialized approach, that ensures repeatability, quality and agility.  It’s the difference between custom tailoring and standardized sizes at The Gap. Both have their place, but one costs more.

The Gap has a self-service catalog.The British bespoke tailor doesn’t.  Self-service should not be an after-thought, it’s the beginning of the cloud journey.  Self-service drives the standardization of offerings and reduces the labor costs that arise from designing, specifying, ordering, procuring, and configuring computing, storage and network resources on custom basis.

This standardization and automation is also applied to the application components, security and network services such as LDAP, DNS, load balancers, etc.

Standardization also provides second order benefits through reduction  in the maintenance, procurement and support costs. After all, if the cloud has one management interface, and all compute blades are the same then the whole break/fix cycle gets simplified and procurement is streamlined.

Automation is all about saving labor. First, the customer saves provisioning labor costs, additional savings are incurred  by reducing errors which means less time fighting fires (labor), and finally, on-going operations (day 2), like applying patch also result in  labor reduction.

The interesting metrics here are the number servers per administration and the number of errors before and after automation is deployed.. As you know, every incident/fire causes a hailstorm of work for a bunch of people.  Imagine if automation detected the error, and fixed it before it became a problem? Well that’s what Cisco Intelligent Automation does for compute, network and even SAP.

Utlization: What did Captain Kirk know that Scotty didn’t?

The third source of cost reduction, increasing utilization, is probably the biggest opportunity for cloud.

Yet it is the hardest for IT professionals to achieve because it requires applying a combination of psychology, market dynamics and technology.

Two big costs in the datacenter are the over-provisioning of capacity and the hoarding of infrastructure by application teams. Introducing service-tiers with different capabilities, costs and delivery options underpinned by automation can be very effective in changing user behavior.

Over provisioning happens because without the ability to quickly and automatically provision services, it’s safer for both IT and the user to deploy the requested capacity. The result is that a lot of VM’s sit there idly at 3% and the operations staff can’t intelligently manage consumption because they don’t have the original request specification and service tier committed.

As to what did Captain Kirk know that Scotty didn’t?  He knew how to balance engine utilization and the mission. Sometimes, Scotty, she can take Warp 10.

In fact, a lot of upfront friction in the design, estimating, specifying happens because the infrastructure teams are trying to ascertain (guess) the real needs of the application, i.e. the mission, so they don’t over or under provision.

Meanwhile, the application teams don’t want to get caught short because a) they’ll get yelled at by the customer if the app is a great success, and b) it takes so long to deploy additional capacity that to be wrong is exactly like an outage.

That’s how workloads becomes gold plated. Every one’s happy, but the CFO.   The CFO then makes the CIO suffer, and that’s how IT and the business get out of alignment. Automation is the infrastructure chiropractor; it helps bring back IT operations and the customer back in alignment.

Self-service and automation start to change this situation. It starts with Introducing a few service offering with standardized service tiers, backed by self-service and automation changes the dynamic of the conversation.

A bronze level service for QA and  development, available in a few minutes, with minimum support might be better than a whole production stack that takes 8 weeks. (It is: that’s why people go to Amazon!).

Over the last ten years, newScale, now part of Cisco,  worked with hundreds of customers and we learned psychology and market forces can be used to shape demand.  Getting something NOW, and cheaper turns out to be a better choice for many users over “perfect later.” Introducing quotas and leases, makes the customer a more engaged partner in managing capacity. Putting coss and prices, even in the absence of chargeback decreases demand. Why? Because human beings are wired that way.

And the technology has changed. Converged infrastructure like Cisco Unified Computing platform or VCE’s vBlock or NetApp’s FlexPod offering when paired with automation and the service catalog keeps the mission and the service level “unified” from the customer view to the operations view down to the actual hardware. This is new and it’s getting wide adoption by infrastructure groups.  The ability to

Hoarding: The roach motel of infrastructure management

Hoarding has similar dynamics. When it’s difficult and burdensome to obtain infrastructure resources, when  the process takes months, there’s a tendency to want to keep the resources forever as the customer knows they might those servers them again.So like the proverbial roach motel, resources go out to customers, but never come back.

Automation changes this dynamic: once the customers see they can get resources get provisioned in minutes, the psychology changes. A user now starts to think: “why spend time renewing a lease for an app I’m not using if I know I can get it back in minutes next time I need it? Plus my boss will get the bill and she is going to ask me if I’m still working on that old project?”

Show them the money!

The final costs savings come from introducing metering and charging. It changes customer behavior and reduces over provisioning to begin with. With dedicated virtual or physical infrastructure, the business paid for it upfront though capital expenditures, the equipment was used and now it sits there idle wasting power, space, licenses and labor.

And from the point of view of the users Infrastructure is “free” so why do anything about? Plus what could be done? Nothing.  In a capital expenditure model driven by business unit P&L, there really wasn’t that much IT people could do.

Subscriptions up-end this model.  A cloud subscription becomes an on-going operational expense that goes on forever unless it’s turned off. This changes the psychology and behavior of the organizations.

To get a sense of the emotions unleased by showing the money, please click here.

(On a personal note, this forever subscription would cause my colleagues and I sometimes to go on hunting expeditions on the Amazon cloud, looking for machines to terminate.  Someday, I’l make a game out of it.)

To end, over provisioning and hoarding are holding hostage a large chunk of a datacenter’s resources. When the user has confidence they can get infrastructure on demand they tend to hoard less, and when leases, quotas and lifecycle visibility is available more infrastructure is freed up.

The Cisco Cloud Portal shows the user exactly what they are consuming, shows the boss what the team is consuming and it shows up in the P/L, the behavior changes. Quickly.

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.