At this year’s Hadoop Summit 2013, I presented on the “The Data Center and Hadoop” which built upon the past two years of testing the effects of Hadoop on the data center infrastructure. What makes Hadoop an important framework to study in the data center is that it contains a distributed system that combines both a distributed file system (HDFS) along with an execution framework (Map/Reduce). Further it builds upon itself and can provide other real-time or key/value stores(HBASE) along with many other possibilities. Each comes with its own set of infrastructure requirements that include throughput sensitive components along with latency sensitive components. Further in the Data Center, understanding how all these components work together is key to optimized deployments.
After studying many of these components and their effects, the very data we were alanyzing became a topic of a lot of our discussions. We combined application performance data, application logs, compute data AND network data to build a complete picture of what is happening in the data center.
With the advent of programmable networks (aka “Software Defined Networking”) it is not only important to make the network more application aware, but to also know where and how to analyze and make the right connections between the application and the network.
Today, Big Data and Hadoop are arguably the hottest (and most mysterious) subjects in computing for most technology workers. Ask any person in IT about Big Data/ Hadoop and you’ll probably get a look of utter confusion. Here at Cisco, I’ve recently taken on the role of Product Manager for Cisco Tidal Enterprise Scheduler (TES) and part of my job is to help you face your fears and put your arms around the Big Data boogeyman.
Big Data’s growth in the market has exploded and it’s clear why: data-driven decision-making results in optimal business outcomes. With Big Data/ Hadoop, analyzing massive datasets has become easier and we glean new business insights, which can be a massive competitive advantage.
I just arrived in NYC for Strata Conference + Hadoop World 2012, where I’m part of the Cisco team here to show off the new 6.1 release of Cisco Tidal Enterprise Scheduler announced yesterday. With 6.1, Cisco TES includes Hadoop integration – to help our customers address the Big Data challenge and gain even more value from your infrastructure. The workload automation features provided by TES are an integral part of getting the most out of your Hadoop deployments.
At the Strata event, we’re featuring Cisco UCS servers and Cisco Nexus switches for Big Data as well as our Cisco TES support for Hadoop. To see Cisco TES and Hadoop in action, check out this online demo here. This demo runs on UCS and schedules a Hadoop MapReduce job every 15 minutes to track tweets at the conference – revealing the biggest Twitter topics and the most active tweeps in Big Data this week.
In addition to our support for Hadoop and Big Data, with TES 6.1 we’ve announced a self-service portal, support for Amazon Web Services’ (AWS) EC2 & S3 features, and an iPhone app. AWS support adds the advantages of cloud-based Hadoop by providing the scalability and agility to expand capacity as needed coupled with Hadoop’s analytical strength. Throwing TES 6.1 into the AWS mix provides automated, efficient provisioning of cloud resources.
Last week we participated in the annual Hadoop Summit held in San Jose, CA. When we first met with Hortonworks about the Summit many months back they mentioned this year’s Hadoop Summit would be promoting Reference Architectures from many companies in the Hadoop Ecosystem. This was great to hear as we had previously presented results from a large round of testing on Network and Compute Considerations for Hadoop at Hadoop World 2011 last November and we were looking to do a second round of testing to take our original findings and test/develop a set of best practices around them including failure and connectivity options. Further the set of validation demystifies the one key Enterprise ask “Can we use the same architecture/component for Hadoop deployments?”. Since a lot of the value of Hadoop is seen once it is integrated into current enterprise data models the goal of the testing was to not only define a reference architecture, but to define a set of best practices so Hadoop can be integrated into current enterprise architectures.
Below are the results of this new testing effort presented at Hadoop Summit, 2012. Thanks to Hortonworks for their collaboration throughout the testing.