While there is not yet an industry standard benchmark for measuring the performance of Hadoop systems (yes, there is work in progress -- WBDB, BigDataTop100 etc), workloads like TeraSort have become a popular choice to benchmark and stress test Hadoop clusters.
TeraSort is very simple, consists of three map/reduce programs (i) TeraGen -- generates the dataset (ii) TeraSort -- samples and sort the dataset (iii) TeraValidate -- validates the output. With multiple vendors now publishing TeraSort results, organizations can make reasonable performance comparisons while evaluating Hadoop clusters.
We conducted a series of TeraSort tests on our popular Cisco UCS Common Platform Architecture (CPA) for Big Data rack with 16 Cisco UCS C240 M3 Rack Servers equipped with two Intel Xeon E5-2665 processors, running Apache Hadoop distribution, see figure below, demonstrating industry leading performance and scalability over a range of data set sizes from 100GB to 50TB. For example, out of the box, our 10TB result is 40 percent faster than HP’s published result on 18 HP ProLiant DL380 Servers equipped with two Intel Xeon E5-2667 processors.
While Hadoop offers many advantages for organizations, the Cisco story isn’t complete without including collaborations with our ecosystem partners that enables us to offer complete solution stacks. We support leading Hadoop distributions including Cloudera, HortonWorks, Intel, MapR, and Pivotal on our Cisco UCS Common Platform Architecture (CPA) for Big Data. We just announced our Big Data Design Zone that offers Cisco Validated Designs (CVD) -- pretested and validated architectures that accelerate the time to value for customers while reducing risks and deployment challenges.
Cisco Big Data Design Zone
Cisco UCS Demonstrates Leading TeraSort Benchmark Performance
Cisco UCS Common Platform Architecture (CPA) for Big Data
Tags: Big Data, Big Data Benchmarks, Cisco UCS C240 M3 Rack Server, Cisco UCS CPA, CPA, Hadoop, TeraSort, YCSB
Cisco UCS Common Platform Architecture (CPA) for Big Data offers a comprehensive stack for enterprise Hadoop deployments. Today we announce the availability of Cisco Validated Design (CVD) for Cloudera (CDH) that describes the architecture and deployment procedures, jointly tested and certified by Cisco and Cloudera to accelerate deployments while reducing the risks, complexity, and total cost of ownership.
Together, Cisco and Cloudera are well positioned to help organizations exploit the valuable business insights found in all their data, regardless of whether it’s structured, semi structured or unstructured. The solution offers industry-leading performance, scalability and advanced management capabilities to address the business needs of our customers.
The rack level configuration detailed in the document can be extended to multiple rack scale. Up to 160 servers (10 racks) can be supported with no additional switching in a single UCS domain. Scaling beyond 10 racks can be implemented by interconnecting multiple UCS domains using Nexus 6000/7000 Series switches, scalable to thousands of servers and to hundreds of petabytes storage, and managed from a single pane using UCS Central.
We would like to invite you to our upcoming Journey to Big Data Roadshow in a city near you, designed to help you identify where you are on your Big Data journey, and how to keep that journey going in a low-risk, productive way.
1. Cisco UCS CPA for Big Data with Cloudera
2. Flexpod Select for Hadoop with Cloudera
3. Cloudera Enterprise with Cisco Unified Computing System (solution brief)
Tags: Cisco UCS CPA, Cloudera, CPA, Hadoop, Journey to Big Data
Cloudera Sessions is coming to a City Near You!
Have you registered for the upcoming Cloudera Sessions roadshow yet? According to IDC Analysts, the market for Big Data will reach $16.9 billion by 2015, with an outrageous 40% CAGR. As the sheer volume of data continues to climb, enterprise customers will need the right software and infrastructure to transform this data into meaningful insights.
Cisco is partnering with Cloudera to offer a comprehensive infrastructure and management solution, based on the Cisco Unified Computing System (UCS), to support our customers big data initiatives. As a proud sponsor for this event, I would encourage you to join us at one of the following scheduled stops to learn more about our joint solutions for big data:
San Francisco 9/11
Jersey City 9/18
Milwaukee 10/17 (Note: changed from 10/2 to 10/17)
Cloudera has a fantastic agenda scheduled in each of the cities featuring keynote speakers that you won’t want to miss. I hope to see you there.
For the latest information regarding Big Data on Cisco UCS, I’ve added a couple links below for your reference:
Introducing Cisco UCS Common Platform Architecture (CPA) for Big Data, By Raghu Nambiar
Top Three Reasons Why Cisco UCS is a Better Platform for Big Data, by Scott Ciccone
Stop by and say hello and let me know if you have any comments or questions, or via twitter at @CicconeScott.
Tags: Big Data, Blade Servers, Cisco UCS, Cisco Unified Computing System, Hadoop, Rack Servers, UCS
Speed is everything. Continuing our commitment to make data center infrastructures more responsive to enterprise applications demands, today, we announced FlexPod Select with Hadoop, formerly known as NetApp Open Solution for Hadoop, broadening our FlexPod portfolio. Developed in collaboration between Cisco and NetApp, offers an enterprise-class infrastructure that accelerates time to value from your data. This solution is pre-validated for Hadoop deployments built using Cisco 6200 Series Fabric Interconnects (connectivity and management), C220 M3 Servers (compute), NetApp FAS2220 (namenode metadata storage) and NetApp E5400 series storage arrays (data storage). Following the highly successful FlexPod model of pre-sized rack level configurations, this solution will be made available through the well-established FlexPod sales engagement and channel.
The FlexPod Select with Hadoop architecture is an extension of our popular Cisco UCS Common Platform Architecture (CPA) for Big Data designed for applications requiring enterprise class external storage array features like RAID protection with data replication, hot-swappable spares, proactive drive health monitoring, faster recovery from disk failures and automated I/O path fail-over. The architecture consists of a master rack and optionally up to nine expansion racks in a single management domain, creating a complete, self-contained Hadoop cluster. The master rack provides all of the components required to run a 12 node Hadoop cluster supporting 540TB storage capacity. Each additional expansion rack provides an additional 16 Hadoop cluster nodes and 720TB storage capacity. Unique to this architecture is seamless management integration and data integration capabilities with existing FlexPod deployments that can help to significantly lower the infrastructure and management costs.
FlexPod Select has been pretested and jointly validated with leading Hadoop vendors, including Cloudera and Hortonworks.
Tags: Big Data, Cloudera, CPA, FlexPod, FlexPod Select, Hadoop, Hortonworks, netapp
At this year’s Hadoop Summit 2013
, I presented on the “The Data Center and Hadoop” which built upon the past two years of testing the effects of Hadoop on the data center infrastructure
. What makes Hadoop an important framework to study in the data center is that it contains a distributed system that combines both a distributed file system (HDFS) along with an execution framework (Map/Reduce). Further it builds upon itself and can provide other real-time or key/value stores(HBASE) along with many other possibilities. Each comes with its own set of infrastructure requirements that include throughput sensitive components along with latency sensitive components. Further in the Data Center, understanding how all these components work together is key to optimized deployments.
After studying many of these components and their effects, the very data we were alanyzing became a topic of a lot of our discussions. We combined application performance data, application logs, compute data AND network data to build a complete picture of what is happening in the data center.
With the advent of programmable networks (aka “Software Defined Networking”) it is not only important to make the network more application aware, but to also know where and how to analyze and make the right connections between the application and the network.
Tags: Big Data, Cisco Nexus, data center, Hadoop, Hadoop Summit, nexus, SDN, software defined networking