Data Lifecycle Management for Hadoop with Cisco UCS and HDP
Hadoop has emerged as a mainstream data management platform across industry verticals. While there is business demands to collect, store, process and derive insights from ever increasing data, enterprises are challenged with sustaining application performance while effectively managing the scale of data growth and retention requirements.
A common trait of data is – value of it changes over time. The usage and utility of data can be categorized into hot, warn and cold. In it’s initial days, data is typically hot and it gradually becomes cold as it ages and utility goes down. It is critical to cost-effectively manage the full life cycle of data and process it as applications demand. This value driven management of data can be challenging. Hadoop addresses above challenge using the tiered storage capability of Apache HDFS. Cisco and Hortonworks have partnered to offer an industry leading solution combining the innovations in Cisco UCS Integrated Infrastructure for Big Data and Hortonworks Data Platform 2.2 utilizing the heterogeneous storage tiers within the HDFS.
An example configuration is shown in figure 1. The configuration consists of 64 Cisco UCS C240 M4 Servers (1.8PB of raw storage capacity) and 4 Cisco UCS C3160 Servers (1.4PB of raw storage capacity) interconnected using a pair of Cisco UCS 6296UP Fabric Interconnects. The solution offers data placement policies to manage tiers of hot, warm, and cold data, with the placement of data tied to its temperature as below:
Hot: All Hadoop three replicas on C240 M4 Servers
Warm:2 replicas on C240 M4 Servers and 1 replica on C3160
Cold: All three replicas on C3160.
The policy can be changed based on the performance requirements of the files and files will be migrated transparently.
Figure 1: Cisco UCS Integrated Infrastructure for Big Data with Tiered Storage Extension
Together, Cisco UCS Integrated Infrastructure for Big Data and Hortonworks Data Platform offer comprehensive set of capabilities for data management, data access, data governance and integration and operations.