Looking for an orchestration taxonomy
In recent years, there have been a number of discussions around the subject of orchestration as a key enabler for different Cloud technologies.
The ETSI NFV Management and Network Orchestration (MANO) working group is defining the main interfaces for resource orchestration, a fundamental layer in management.
It is important to define standard interfaces, but equally important is to understand the main capabilities for an orchestration (or choreography) solution. We can gain some more insight by revisiting previous work, particularly in the domain of Grid computing.
Personally, I found the work done by Ian Foster and Steven Tuecke around IT as a Service (back in 2005, 9 years ago!), still extremely relevant. It is fascinating to see how applicable this work continues to be, apart perhaps from the replacement of general SOA services by REST services in particular. We should pay special attention to their definition of Grid Infrastructure: “enable the horizontal integration across diverse physical resources”. I see their work applicable beyond the physical layer, to logical resources and their composition into services. Quoting the paper, the Grid Infrastructure’s capabilities should be:
- Resource modeling: describes available resources, their capabilities, and the relationships between them to facilitate discovery, provisioning, and quality of service management.
- Monitoring and notification: provides visibility into the state of resources to enable discovery and maintain quality of service.
- Allocation: Assures quality of service across an entire set of resources for the lifetime of their use by an application.
- Accounting and auditing: tracks the usage of shared resources and provides mechanisms for transferring costs among user communities and for charging for resource use by applications and users
- Provisioning, life-cycle management and decommissioning: enables an allocated resource to be configured automatically for application use, manages the resource for the duration of the task at hand and restores the resource to its original state for future use.
Two questions that come to mind: (1) how have requirements changed in these 9 years? (2) how (if at all) should we update these definitions to reflect the advances in infrastructure? It is clear that the new Cloud/NFV scenarios require increased scalability where targets are somewhat obscure with an increasing diversity of resources and services that have more complex relationships (virtualization, composition and interactions with legacy infrastructure). The new infrastructure needs to respond to the state changes of resources much faster, to fulfill more stringent SLA’s in a more scalable and diverse environment, thus creating new challenges for the assurance applications (network, application, service assurance.)
In recent years the industry has focused extensively on the provisioning capability, pushed by the need for automation and thanks to technology advancements in Openstack, network controllers, and “DevOps” tools such as Chef, Puppet, Cloudify, etc. However, to address the challenges coming from the new use cases, a more balanced focus on all capabilities mentioned by Foster and Tuecke will be required. In most cases, the key enabler to deliver on all these areas is the use of advanced analytics to allow matching supply and demand for resources, similarly to a “just in time” production model that goes beyond resources to services and business processes.
What do you think? What has changed since Foster and Tuecke’s publication?
Special thanks to Gary Berger, Frank Van Lingen and Marco Valente for their reviews to this text.