While there is not yet an industry standard benchmark for measuring the performance of Hadoop systems (yes, there is work in progress -- WBDB, BigDataTop100 etc), workloads like TeraSort have become a popular choice to benchmark and stress test Hadoop clusters.
TeraSort is very simple, consists of three map/reduce programs (i) TeraGen -- generates the dataset (ii) TeraSort -- samples and sort the dataset (iii) TeraValidate -- validates the output. With multiple vendors now publishing TeraSort results, organizations can make reasonable performance comparisons while evaluating Hadoop clusters.
We conducted a series of TeraSort tests on our popular Cisco UCS Common Platform Architecture (CPA) for Big Data rack with 16 Cisco UCS C240 M3 Rack Servers equipped with two Intel Xeon E5-2665 processors, running Apache Hadoop distribution, see figure below, demonstrating industry leading performance and scalability over a range of data set sizes from 100GB to 50TB. For example, out of the box, our 10TB result is 40 percent faster than HP’s published result on 18 HP ProLiant DL380 Servers equipped with two Intel Xeon E5-2667 processors.
While Hadoop offers many advantages for organizations, the Cisco story isn’t complete without including collaborations with our ecosystem partners that enables us to offer complete solution stacks. We support leading Hadoop distributions including Cloudera, HortonWorks, Intel, MapR, and Pivotal on our Cisco UCS Common Platform Architecture (CPA) for Big Data. We just announced our Big Data Design Zone that offers Cisco Validated Designs (CVD) -- pretested and validated architectures that accelerate the time to value for customers while reducing risks and deployment challenges.
Cisco Big Data Design Zone
Cisco UCS Demonstrates Leading TeraSort Benchmark Performance
Cisco UCS Common Platform Architecture (CPA) for Big Data
Tags: Big Data, Big Data Benchmarks, Cisco UCS C240 M3 Rack Server, Cisco UCS CPA, CPA, Hadoop, TeraSort, YCSB
The Transaction Processing Performance Council today announced its fifth international Conference on Performance Evaluation and Benchmarking (TPCTC 2013). I’ve the great privilege of chairing TPCTC series since 2009. This year’s conference will be collocated with the 39th International Conference on Very Large Data Bases (VLDB 2013) on August 26, 2013 in Riva del Garda, Italy. With this conference we are encouraging researchers and industry experts to submit ideas and methodologies in performance evaluation, measurement and characterization. Additional information on TPCTC 2013 is available online at http://www.tpc.org/tpctc/tpctc2013/.
Tags: Big Data, Big Data Benchmarks, bigdatatop100, Hadoop, TPCTC, WBDB, WBDB 2012, wbdb2012-in
Following the successful workshop “Towards an Industry Standard for Benchmarking Big Data Workloads” (WBDB 2012) held in May 2012 in San Jose , the Second Workshop on Benchmarking Big Data Workloads (WBDB2012.in)  will be held in Pune, India from 17 to 18 December at the Hinjewadi Campus of Persistent Systems Ltd, colocated with the 18th International Conference on Management of Data (COMAD 2012) .
I have the great pleasure to co-chair this workshop with my distinguished colleagues Chaitanya Baru, Meikel Poess, Milind Bhandarkar and Tilmann Rabl with support from the National Science Foundation (NSF.gov).
The objective of the workshop series is to foster the development of industry standards for providing objective measures of the effectiveness of hardware and software systems dealing with Big Data. Several industry experts and researchers are expected to present and debate their vision on benchmarking big data platforms.
 WBDB 2012.in http://clds.ucsd.edu/wbdb2012.in, CFP: http://clds.ucsd.edu/sites/clds.ucsd.edu/files/WBDB.in_.cfp_.pdf
 WBDB 2012 http://blogs.cisco.com/datacenter/towards-an-industry-standard-for-benchmarking-big-data-workloads/, http://clds.ucsd.edu/wbdb2012/
 COMAD 2012 http://comad.in/comad2012
 WBDB 2012.in Program Committee http://clds.ucsd.edu/wbdb2012.in/organizers
Tags: Big Data, Big Data Benchmarks, WBDB
Industry standard benchmarks have played, and continue to play, a crucial role in the advancement of the computing industry. Demands for them have existed since buyers were first confronted with the choice between purchasing one system over another. Over the years, industry standard benchmarks have proven critical to both buyers and vendors: buyers use benchmark results when evaluating new systems in terms of performance, price/performance and energy efficiency, while vendors use benchmarks to demonstrate competitiveness of their products and to monitor release-to-release progress of their products under development . Historically we have seen that industry standard benchmarks enable healthy competition that results in product improvements and the evolution of brand new technologies.
Over the past quarter-century, industry standard bodies like the Transaction Processing Performance Council (TPC) and the Standard Performance Evaluation Corporation (SPEC) have developed several industry standards for performance benchmarking, which have been a significant driving force behind the development of faster, less expensive, and/or more energy efficient system configurations.
The world has been in the midst of an extraordinary information explosion over the past decade, punctuated by rapid growth in the use of the Internet and the number of connected devices worldwide. Today, we’re seeing a rate of change faster than at any point throughout history, and both enterprise application data and machine generated data, known as Big Data, continue to grow exponentially, challenging industry experts and researchers to develop new innovative techniques to evaluate and benchmark hardware and software technologies and products.
I am co-chairing a workshop with my distinguished colleagues Chaitanya Baru, Meikel Poess, Milind Bhandarkar, Tilmann Rabl and others entitled Workshop on Big Data Benchmarking (WBDB 2012) , supported by the National Science Foundation (NSF.gov). This is a crucial initial step towards the development of an industry standard benchmark for providing objective measures of the effectiveness of hardware and software systems dealing with Big Data. Several industry experts and researchers have been invited to present and debate their vision on benchmarking big data platforms.
A report from this workshop will be presented at the just-announced 4th International Conference on Performance Evaluation Benchmarking (TPCTC 2012) , organized by the TPC, which will be collocated with the 38th International Conference of Very Large Data Bases (VLDB 2012), a premier forum for data management and database researchers, vendors and users. With this conference, we encourage industry experts and researchers to submit ideas and methodologies in performance evaluation, measurement and characterization in areas including, but not limited to: big data, cloud computing, business intelligence, energy and space efficiency, hardware and software innovations and lessons learned in practice using TPC and other benchmark workloads .
Cisco has been an active member of the TPC since 2010 and the SPEC since 2009.
 R. Nambiar, N. Wakou, P. Thawley, A. Masland, M. Lanken, M. Majdalany, F. Carman: Shaping the Landscape of Industry Standard Benchmarks: Contributions of the Transaction Processing Performance Council: Springer 2011
 Workshop on Big Data Benchmarking: http://clds.ucsd.edu/wbdb2012/
 TPC Press Release: http://finance.yahoo.com/news/transaction-processing-performance-council-announces-150000511.html
 TPCTC 2012 Call for Papers: http://www.tpc.org/tpctc2012/
Tags: Big Data, Big Data Benchmarks, Industry Standard, TPCTC 2012, WBDB 2012