While there is not yet an industry standard benchmark for measuring the performance of Hadoop systems (yes, there is work in progress – WBDB, BigDataTop100 etc), workloads like TeraSort have become a popular choice to benchmark and stress test Hadoop clusters.
TeraSort is very simple, consists of three map/reduce programs (i) TeraGen – generates the dataset (ii) TeraSort – samples and sort the dataset (iii) TeraValidate – validates the output. With multiple vendors now publishing TeraSort results, organizations can make reasonable performance comparisons while evaluating Hadoop clusters.
We conducted a series of TeraSort tests on our popular Cisco UCS Common Platform Architecture (CPA) for Big Data rack with 16 Cisco UCS C240 M3 Rack Servers equipped with two Intel Xeon E5-2665 processors, running Apache Hadoop distribution, see figure below, demonstrating industry leading performance and scalability over a range of data set sizes from 100GB to 50TB. For example, out of the box, our 10TB result is 40 percent faster than HP’s published result on 18 HP ProLiant DL380 Servers equipped with two Intel Xeon E5-2667 processors.
While Hadoop offers many advantages for organizations, the Cisco story isn’t complete without including collaborations with our ecosystem partners that enables us to offer complete solution stacks. We support leading Hadoop distributions including Cloudera, HortonWorks, Intel, MapR, and Pivotal on our Cisco UCS Common Platform Architecture (CPA) for Big Data. We just announced our Big Data Design Zone that offers Cisco Validated Designs (CVD) – pretested and validated architectures that accelerate the time to value for customers while reducing risks and deployment challenges.