Kubernetes + Cisco Container Platform: Petascale Genetic Research

Guest author:

Jacob Loftis

Systems Engineer for Americas Partner Organization at Cisco

Welcome to my first blog post. To be honest, authoring a blog is all new to me but thankfully, the topic isn’t: Kubernetes. Through my experience, there are two questions I continue to hear in the marketplace: why should I use Kubernetes and what are the tangible benefits to my organization? So those are the two topics I’d like to focus on.

First, I’d like to give you a little insight about a fantastic Kubernetes use case I’m involved with. It involves genetic research. As an engineer inside our Americas Partner Organization, I work on joint Cisco solutions with public cloud providers. And over the past few months I’ve had the privilege of working with Dr. Alex Feltus, a professor of genetic research and biochemistry at Clemson University. He’s at the forefront of utilizing Kubernetes for analyzing massive datasets of genetic data for cancer research. We’ve been working with Alex and his team to develop a proof of concept for solving the inherent problems this kind of research often exhibits.

Don’t worry, our partnership with Clemson University isn’t another Kubernetes tutorial on making a “hello world” web page. There are too many of those already. Instead we’re doing something far more important – cancer research – using the power of Kubernetes to analyze the amount of genetic interactions in tumors.

Why use Kubernetes?

This is a natural lead-in to our first question, “why Kubernetes?” To answer that we need to understand the inherent nature of research that Clemson is doing. It requires more than just computing power and storage. It also means enabling capabilities around:

Scalability: The genetic datasets that Alex deals with can be more than 5 petabytes each. So it’s important that the computing environment can easily scale storage, memory, and computing power. This means being able to quickly create more and bigger containers.
Elasticity: There isn’t just one analytical model used in this case, rather hundreds scattered among several research organizations. It’s important to provide the ability to interchange new models by destroying and creating new containers. But they need to interchange these models on the same dataset and without having to spin up a new query or application.
Reliability: Imagine trying to manually spin up all these containers. It’s not only time consuming but also error prone.
Shareability: Other research organizations can benefit from the data, models, and findings that Clemson produces. This makes the ability to securely and easily share through Kubernetes clusters vital to making headway on cancer research. Plus there is significant existing computing power and storage that organizations provide to researchers. Clemson wants to have a variety of sources for computing power both on prem and the cloud.

Kubernetes solves nearly three of the four problems since it’s a container orchestrator that provides scalability, elasticity, and reliability.

But what about shareability?

Can the cloud or a massive data center (such as Google Cloud or BigQuery) solve the shareability issue? If so, then we must remember that:

The sheer amount of genetic data is exponentially growing. For example, the human genome project took 15 years to code one DNA strand but scientists were able to process the genome of the novel coronavirus (COVID-19) in a matter of days. This makes putting data in the cloud challenging and costly.
Scientists are unable to predict how many experiments will be needed, and thus how much computing power, to produce breakthroughs in their research. Because of this, budgeting for cloud computing credits becomes nearly impossible without hindering research.

A “single pane of glass” to manage Kubernetes clusters

This means a “manager” is needed for the container orchestrator, one that can easily spin up multiple Kubernetes clusters in any on-prem data center or cloud provider like Google Cloud GKE. This is where the value of deploying the Cisco Container Platform (CCP) becomes apparent. It has the ability to provide the “single pane of glass” researchers at Clemson University and other centers for research need to efficiently create and manage the clusters needed to enable life-saving cancer research.