API Compatibility: From cruising at self-service speed to “we gotta have a meeting”
Today’s announcement that Citrix is dropping support for OpenStack has reverberated through the clouderati sphere like a new Justin Bieber song through my niece’s third grade class. Super important but will not matter much when the next idol arrives.
In any case, a lot of smart people have written about it. I’ll leave them to explain the whole thing.
But the post that most caught my attention came from Thorsten at Rightscale‘s. We both share something in common: we both build products that connect to cloud API’s. Including vendor who have API’s that claim to be compatible EC2. This experience, I think provides a useful point of view when thinking about API compatibility. Not to mention it creates a jaundiced view of the human soul.
I’ve said it many times and I’ll repeat it again: it’s the semantics of the resources in the cloud that matter, not the syntax of the API. This means that “API compatibility” has to reach very, very deep to be meaningful. Let me give you a couple of examples around EC2.
George Reese at Enstratus has written about this as well. Thorsten then refers to three areas where semantics trump compatibility: Instance sizes, the use of storage, the usage of multiple zones, and differentiation of services. These are by no means the only ones.
Let’s start with instance sizes. Amazon has very definite meaning to their instance sizes which are not trivial to replicate in a data center, or as Thorsten’s says, you may not want to replicate in your datacenter because your economics might be better than Amazon’s.
That is, while certain instance sizes make economic sense to the seller of cloud services, they may not make sense in a private cloud. For example, why does small Linux come with 160 G of disk? It always seemed excessive to me.
But if we change instance sizes, this has many implications for the operation of an application from performance to monitoring to SLA’s.
In simpler terms, a developer goes from cruising at self-service API speed to “we got to have meetings”. All the code that uses the EC2 API’s makes assumptions about the underlying semantics, and those semantics are expressions of operating models, organizational structures and business policies.
Change the semantics, and you have to have meetings to gather requirements, design, spec, build, test, deploy, etc. And that’s just if we change the machine instance size.
By the way, I recently lived this journey. My team took our Cisco Cloud Portal and Cisco Process Orchestrator working on top of the the vCloud API’s and moved to manage an OpenStack cloud using the Amazon EC2 API’s.
This instance size issue showed up immediately. vCloud has no concept of sizes, OpenStack Amazon API’s, has no concept of configuring VM’s. Much hilarity ensued, but I’ll that save that story for another post.
Storage. There’s something wrong with Kevin’s storage
When we add storage services like EBS, the road gets long and windy indeed. Here Thorsten writes about Elastic Block Storage (EBS).
#2: EC2 EBS block storage devices have quite peculiar performance characteristics (that are not universally liked…) both for regular I/O as well as for snapshots. It would seem rather crazy to try to duplicate those characteristics and not benefit from improvements that are possible in a smaller purpose-built private cloud. But by doing so the operating procedures for deployments may change rapidly making the notion of “compatibility” questionable. Put differently, should one retain compatibility if compatibility is worse?
Thorsten is being too kind to EBS. For many workloads, particularly traditional SQL databases, you can’t depend on EBS delivering consistent performance. So Amazon users have learned to do all kinds of architectural gymnastics to get stable, predictable storage performance to an application.
In other words, developer’s fixed the issue through writing new code. This option is often not available to private cloud users, users of commercial applications and platforms, and people with a life outside work. Also, it requires even more meetings.
But why would one re-write an app when self-administered root canals are so much more pleasant? And if the private cloud has an EMC or NetApp storage system with service levels, back up, dedup, etc that provides stable, predictable IO? Why no just use that?
The answer is we would, but that’s change in the deployment so API compatibility matters less.
Zoning out at the speed of light
There are other issues such as how workloads work in zones and datacenter in Amazon. For example, if one wants to run the same instance in both the Virginia datacenter and the EU-region datacenter, it’s necessary to copy the template image (AMI) between these sites.
Two issues arise. First, you now have to keep the same image synchronized. This is an error-prone pain. Literal quote from a meeting: “Has any one seen my image?” Ok. So automation, like what Cisco provides can help with this.
The second is tougher as it involves speed of light issues. A year ago, moving a 6 Gigabyte image between the US and EU was a three day process. First, it took 6 hours (a workday) to move the parts. Any corruption or failure required re-starting the process the next day. I’ve spent many an hour counting file parts to ensure they all made it through the atlantic safe and sound.
This means that we had to build a three day buffer on deploying a new image to all the Amazon Datacenters. This is but one example of the operational changes that technology enables and disables. If the transfer only takes a minute, my operational model would be different.
API compatibility is not going to solve that operational problems and it doesn’t get you as much as you would think. But it’s still important. Cloud is complex enough and any simplifcation and standardization can mean the difference between a succesful project and a failure. Compatibility can also get you a richer ecosystem of tools.