Avatar

I’ve written in the past about the opportunity for Hadoop-as-a-Service (HaaS) – providing self-service provisioning, elastic scaling, and support for multi-tenancy. But in my discussions with customers over the past year, it’s become clear that the opportunity is even bigger than Hadoop. The next big thing in big data is Big-Data-as-a-Service (BDaaS).

There are three key trends driving the evolution and emergence of this new BDaaS opportunity:

Apache Spark and the evolving big data ecosystem. Hadoop recently celebrated its 10th birthday and continues to gain widespread adoption. But in recent years, other new big data frameworks and tools have also gained in popularity. Foremost among these is Apache Spark, the most active open source project in big data. We’re also seeing increased interest in Kakfa, Flink, NoSQL technologies such as Cassandra, and much more. And there continues to be rapid innovation in the commercial software market for big data – including analytics, ETL, search, log analytics, and other BI tools. Hadoop is still at the forefront (and many of these tools complement and extend Hadoop), but BDaaS is much more than Hadoop.

Enterprise adoption of containers and microservices. Container and microservices technology (Docker in particular) has taken hold in the enterprise, and the pace of adoption has accelerated over the past year. Like Spark, Docker has become one of the fastest growing open source technologies ever. Application developers have embraced the simplicity and agility of containers, and microservices are a foundation of the DevOps model. Enterprise IT teams have made containers part of their architecture strategy. And the container revolution is now being extended to big data applications.

The cloud experience for big data, with no compromises. Until recently, big data deployments were almost exclusively bare metal on-premises. But now data scientists, analysts, and developers in the line of business want the cloud experience; they want self-service, on-demand clusters, elasticity, and DevOps agility with all their big data tools. There are several public  cloud services for Hadoop and Spark, but there are important factors that prevent many big data workloads from moving to the public  cloud – including performance, security, compliance, and data gravity. Data gravity means that data that already resides on-prem is likely to stay on-prem due to the cost, risk, and challenges of moving very large volumes of data. Using container technology and next generation big data infrastructure, customers can have the BDaaS cloud experience and the enterprise-grade performance, security, compliance, and high availability required for big data workloads on-premises.

To learn more about Big-Data-as-a-Service, register for our upcoming joint webinar on September 15th with BlueData, a software company that provides an innovative platform for BDaaS using Docker containers, and a Cisco Solution Partner: http://bit.ly/2bH3EJA



Authors

Raghunath Nambiar

No Longer with Cisco