Managing Production-Grade Kubernetes in Production
Kubernetes is a production-grade container orchestration solution that automates management of cloud-native applications. Features that make Kubernetes production-grade include automated rollout and rollback, auto-scaling, and self-healing of containerized applications (among other things).
It is easy to download in install Kubernetes and get started. But what if you are running Kubernetes in production and supporting a range of application service teams from different business units? What if you have containerized a mix of traditional multi-tier, micro-service, or even Kubernetes-native application architectures?
Kubernetes Drives IT Ops Change
The reality is that deploying and managing production-grade Kubernetes in production changes the way IT operates. IT may currently operate in a way that is mostly optimized for managing VM-based applications and may not yet be container optimized.
With this new technology, an application can be packaged in a Docker container, and all the things developers know about running the application can be described in a declarative YAML file called a Pod Manifest file (that includes environment variables, storage drivers, security and network configurations and so forth) and then deployed in any Kubernetes environment.
This approach codifies the application and libraries needed to deploy. But it also codifies run books, launch meeting notes, and other tacit knowledge that typically gets passed back and forth between dev and ops teams.
Containers and Kubernetes codify both build and run constructs and promise to increase feature velocity. But that also helps avoid the application lifecycle handoffs that notoriously create friction and miscommunication.
Based on my conversations with Cisco customers who have deployed or are working toward Kubernetes in production, many are still thinking through the changes that get introduced to IT operations as they work with this new technology.
In fact, these customers still have questions about what roles, operating tools, and processes change. And have a general feeling that there is no “easy button” when running production-grade Kubernetes in production, at scale, across multiple dev or application teams.
Here’s a quick rundown of some things you might consider when planning Kubernetes in production.
Separation of Concerns
The Kubernetes journey may start with a developer that wants to deploy a containerized application service. Many firms are moving toward the Netflix model where developers have more responsibility for supporting their application. So IT may consider separation of concerns to clarify roles and help people can focus on their service.
So, IT might assign a namespace for a new project and then allocate resources to that namespace for a particular application service. However, many applications include multiple containerized services. IT may find it more logical to deploy and hand over a whole Kubernetes cluster for multiple related teams and services, but the end-to-end micro services application may end up spanning multiple clusters.
Point being, it may be hard to identify the appropriate span of control for a specific developer, or dev team, or project, or application. With multiple dev teams, multiple legacy and new application architectures, multiple clusters in multiple locations – things get complicated.
Cluster Lifecycle Management
Installing Kubernetes is easy. But how will IT operate a changing mix of clusters once deployed? Once a cluster has dozens of applications and a mix of namespaces and resource allocations, lifecycle managing Kubernetes becomes a challenge.
A new version of upstream Kubernetes is released by Google every quarter. Who will do the “Cluster Ops” for all those clusters? Who gets to pick the Kubernetes version of the cluster? Are you going to test your Kubernetes upgrade in a staging environment before upgrading the production cluster?
Will you offer “cluster as a service” and let different user groups spin up clusters on demand? Will you allocate physical resource pools for on-demand and temporary use, separate from resource pools that are reserved for production? Will you offer HA/DR for production clusters? What about backup for persistent storage?
How many clusters will you support? If you have 10 applications teams, will you have 10 clusters for development plus one in production? Or separate dev, test, staging and production clusters by business group?
Kubernetes is an orchestrator that relies on abstractions at the cluster, namespace, node and pod levels. It then makes automated changes on the fly based on policies that impact intra-cluster and inter-cluster network communications such as auto-scaling and self-healing the application. You will need a policy-driven container network solution to create and manage a fabric for physical, VM, and Container network constructs at the cluster level. And you will need a policy-driven network solution (ideally software defined) to manage traffic between clusters.
Containers are ephemeral. Data is not. So many container workloads will require persistent storage. Distributed and software-defined storage is a great solution for self-replicating storage solutions like Cassandra, that are popular for containerized workloads. So you may need to consider a high-performance IO services framework for flexible storage and networks.
Developers want to focus on coding features in their application. Containerizing their workloads makes their life easier. But Kubernetes is yet another tool developers need to learn. So, your consumption model needs to support both Kubernetes gurus who are comfortable at the command line and want to write their own pod manifest files, as well as UI-based options for those who don’t know Kubernetes and just want to deploy their application.
It’s important to also consider if are you going to give developers command line access to deploy and change things in your production cluster? What about traditional cost and compliance controls? You know, the guardrails that go around the powerful automation.
There are great blogs that address the big topic of container security. See Google. But speaking of governance controls, are you going to limit containers in production to only those previously inspected and approved security? What about in pre-production environments? Are you going to support a range of container repositories? Or enforce the use of one?
This may be one of the biggest areas of change and concern I hear from customers as they move to container-optimized operations. IT Ops mostly runs on Virtual Machines. Management, monitoring, troubleshooting and repair tools and processes, as well as resource allocation methodologies may be optimized around VM-oriented operations.
What about when managing containers? Your application is slowing down. Can your “Application Ops” see what is happening at the cluster, node, pod, container, and underlying resource (compute, storage, network) levels and respond appropriately? Kubernetes changes the way you build and run applications. But also changes how monitoring, logging, tracing and other functions that are critical to maintaining service quality.
In summary, Cluster Ops, and containerized Application Ops are obvious role changes as IT moves to container and Kubernetes-based operations. But keep in mind that Kubernetes consumers have different skill levels and needs, and probably mostly just want to focus on their application.
KubeCon + CNCF in Copenhagen Booth #D-E03
Cisco has a wide range of solutions that help you run Kubernetes in production. And we contribute to various open source projects like Kubernetes, Contiv, Istio Service Mesh, and FD.io.
Stop by the booth to talk to many of the Cisco engineers that contribute to CNCF projects. And learn how our partnership with Google is helping create a truly Open Hybrid Cloud Solution based on Kubernetes.