Announced today, TPC-DS V2 is the Industry’s first standard for benchmarking SQL based big data systems.
Over the last two years, the Transaction Processing Performance Council (TPC), has reinvented itself in several ways – with new standard developments in Big Data, Virtualization, and the Internet of Things.
Foreseeing the demand for standards for characterizing big data systems, in August 2014, the TPC announced the TPC Express Benchmark HS (TPCx-HS) – the industry’s first standard for benchmarking big data systems. The TPCx-HS was designed to evaluate a broad range of system topologies and implementation methodologies related to big data. The workload is based on a ‘simple’ application that is highly relevant to Big Data, especially for Hadoop based systems. ‘Simple’ is great – historically the end user customers have adopted simple workloads and easy to understand metrics. (Look at TPC-C! One of the most successful industry standards, with over a thousand publications demonstrating the progress of application performance inline with Moore’s law for over a quarter century. Metric is – transactions per minute – can we think of anything simpler than that?). TPCx-HS has done well so far as a standard, giving verifiable performance and TCO, with over a dozen benchmark publications with products from more than six vendors definitely broke the records for standards since the TPC-H in 1999.
That said, there is an important play for ‘complex‘ workloads, especially in developer and researcher circles. One such example is TPC-DS, originally developed to evaluate complex decision support systems, based on relational database systems. There is a long and interesting history with TPC-DS, it took over ten years for the TPC to develop this standard. Though there have been several research papers and case studies, there has been no official results submission since it became a standard in 2011. There are several technical and non-technical reasons, top among them are (i) the complexity of the workload with 99 query templates and concurrent data maintenance, (ii) complex means uncertainty, vendors are concerned about “over exposure” of their technologies and products in terms of performance and price-performance. So its a successful benchmarks in terms of serving the academic and research community but a failure in terms of serving the customers (purchase decision makers).
Interestingly, in the last two years, the Hadoop community has adopted the TPC-DS workload for performance characterization; this is mainly due to the richness and broad applicability of the schema, data generation, and some aspects of the workload, and its non bias towards relational systems. And, not surprisingly, there have been several claims that are not verifiable and reproducible by the end users – and obviously in violation of the TPC’s fair use polices. To put an end to this in a positive way, the TPC stepped up and created a work stream to extend support for non relational (Hadoop etc.) systems, resulting in the creation of the TPC-DS 2.0. If you go through the specification, you will see well thought out changes to make it Hadoop friendly in ACID compliance, data maintenance, and metric.
I am most excited about it’s use in comparing SQL based systems – traditional relational systems vs. non-relational – in terns of performance and TCO – something on top of mind for many.
The TPC is not stopping here. We are developing another benchmark – TPC Express Benchmark BB (TPCx-BB), that shares several aspects of TPC-DS, which will be offered as an easy to run kit. TPCx-BB is currently available for public review. The TPC is encouraging interested parties to provide their reviews by January 4, 2016 by clicking here TPCx-BB. And, if benchmarking IoT is of interest to you please join the IoT working group.
Significant contributors to the development of TPC-DS include Susanne Englert, Mary Meredith, Sreenivas Gukal, Doug Johnson, Lubor Kollar, Murali Krishna, Bob Lane, Larry Lutz, Juergen Mueller, Bob Murphy, Doug Nelson, Ernie Ostic, Raghunath Nambiar, Meikel Poess (chairman), Haider Rizvi, Bryan Smith, Eric Speed, Cadambi Sriram, Jack Stephens, John Susag, Tricia Thomas, Dave Walrath, Shirley Wang, Guogen Zhang, Torsten Grabs, Charles Levine, Mike Nikolaiev, Alain Crolotte, Francois Raab, Yeye He, Margaret McCarthy, Indira Patel, Daniel Pol, John Galloway, Jerry Lohr, Jerry Buggert, Michael Brey, Nicholas Wakou, Vince Carbone, Wayne Smith, Dave Steinhoff, Dave Rorke, Dileep Kumar, Yanpei Chen, John Poelman, and Seetha Lakshmi.