Cisco Blogs


Cisco Blog > Data Center

TPC-DS V2 – A new benchmark standard for SQL based Big Data systems

Announced today, TPC-DS V2 is the Industry’s first standard for benchmarking SQL based big data systems.

Over the last two years, the Transaction Processing Performance Council (TPC), has reinvented itself in several ways – with new standard developments in Big Data, Virtualization, and the Internet of Things.

Foreseeing the demand for standards for characterizing big data systems, in August 2014, the TPC announced the TPC Express Benchmark HS (TPCx-HS) – the industry’s first standard for benchmarking big data systems. The TPCx-HS was designed to evaluate a broad range of system topologies and implementation methodologies related to big data. The workload is based on a ‘simple’ application that is highly relevant to Big Data, especially for Hadoop based systems. ‘Simple’ is great – historically the end user customers have adopted simple workloads and easy to understand metrics. (Look at TPC-C! One of the most successful industry standards, with over a thousand publications demonstrating the progress of application performance inline with Moore’s law for over a quarter century. Metric is – transactions per minute – can we think of anything simpler than that?). TPCx-HS has done well so far as a standard, giving verifiable performance and TCO, with over a dozen benchmark publications with products from more than six vendors definitely broke the records for standards since the TPC-H in 1999.

That said, there is an important play for ‘complex‘ workloads, especially in developer and researcher circles. One such example is TPC-DS, originally developed to evaluate complex decision support systems, based on relational database systems. There is a long and interesting history with TPC-DS, it took over ten years for the TPC to develop this standard. Though there have been several research papers and case studies, there has been no official results submission since it became a standard in 2011. There are several technical and non-technical reasons, top among them are (i) the complexity of the workload with 99 query templates and concurrent data maintenance, (ii) complex means uncertainty, vendors are concerned about “over exposure” of their technologies and products in terms of performance and price-performance. So its a successful benchmarks in terms of serving the academic and research community but a failure in terms of serving the customers (purchase decision makers).

Interestingly, in the last two years, the Hadoop community has adopted the TPC-DS workload for performance characterization; this is mainly due to the richness and broad applicability of the schema, data generation, and some aspects of the workload, and its non bias towards relational systems. And, not surprisingly, there have been several claims that are not verifiable and reproducible by the end users – and obviously in violation of the TPC’s fair use polices. To put an end to this in a positive way, the TPC stepped up and created a work stream to extend support for non relational (Hadoop etc.) systems, resulting in the creation of the TPC-DS 2.0. If you go through the specification, you will see well thought out changes to make it Hadoop friendly in ACID compliance, data maintenance, and metric.

I am most excited about it’s use in comparing SQL based systems – traditional relational systems vs. non-relational – in terns of performance and TCO – something on top of mind for many.

The TPC is not stopping here. We are developing another benchmark – TPC Express Benchmark BB (TPCx-BB), that shares several aspects of TPC-DS, which will be offered as an easy to run kit. TPCx-BB is currently available for public review. The TPC is encouraging interested parties to provide their reviews by January 4, 2016 by clicking here TPCx-BB. And, if benchmarking IoT is of interest to you please join the  IoT working group.

Significant contributors to the development of TPC-DS include Susanne Englert, Mary Meredith, Sreenivas Gukal, Doug Johnson,  Lubor Kollar, Murali Krishna, Bob Lane, Larry Lutz, Juergen Mueller, Bob Murphy, Doug Nelson, Ernie Ostic, Raghunath Nambiar, Meikel Poess (chairman), Haider Rizvi, Bryan Smith, Eric Speed, Cadambi Sriram, Jack Stephens, John Susag, Tricia Thomas, Dave Walrath, Shirley Wang, Guogen Zhang, Torsten Grabs, Charles Levine, Mike Nikolaiev, Alain Crolotte, Francois Raab, Yeye He, Margaret McCarthy, Indira Patel, Daniel Pol, John Galloway, Jerry Lohr, Jerry Buggert, Michael Brey, Nicholas Wakou, Vince Carbone, Wayne Smith, Dave Steinhoff, Dave Rorke, Dileep Kumar, Yanpei Chen, John Poelman, and Seetha Lakshmi.

References

TPC-DS V2 Specification
TPC Press Release
Vendor-Neutral Benchmarks Drive Tech Innovation
The making of TPC-DS
Transaction performance vs. Moore’s law: a trend analysis

 

A Year in Review. Big Data and Analytics 2015. 3rd Generation Platforms, World Record Performance, and Expanding Partnerships

My goodness… Were we ever busy in 2015! Our Cisco Big Data & Analytics teams executed and delivered a tremendous body of work with several key accomplishments these past 12 months. All of our activities – across all of our teams – was focused on delivering to you leading innovation, with industry leading performance & scalability, and offering flexibility via a variety of Big Data choices. Of course all of it based on Cisco UCS, Nexus, and ACI. Let’s take a look at some of the highlights:

Platform

We introduced throughout 2015 various versions of our 3rd generation Big Data architecture. The solution, Cisco’s UCS Integrated Infrastructure for Big Data, integrates our industry-leading computing, network, and management capabilities into a unified fabric-based architecture. Packaged as a Cisco Validated Designs (CVD) our architecture supports the leading Hadoop distributions: Cloudera, Hortonworks, IBM, and MapR. Our Big Data CVDs provide you peace of mind as they are tested, validated, and supported. Take a peak at our Big Data CVDs here and see how they can expedite your Hadoop projects and drive operational efficiency.

Performance Read More »

Tags: , , , , , , , , ,

Composable Infrastructure Part 6: Understanding Infrastructure Options

Pathway OptionsThe IT industry is in a significant period of transition, and the infrastructure landscape has changed a great deal. There are many options today, and the number of options will grow over the next two years. Having more options can more lead to complexity and potential limitation.  As you assess your options you need more information and context, so you can make the right choices and avoid problems down the road.

Software defined infrastructure (SDI) has made it possible to create these new categories of products.  In addition to traditional rack and blade servers and SAN storage, there is converged infrastructure, hyper-converge infrastructure and now composable infrastructure. As you evaluate these new infrastructure options, one of the most important considerations is choosing the right management software to support these products.  You don’t want to add to complexity by creating islands of infrastructure that need to be managed separately.

Read More »

Tags: , ,

Microsegmentation with Cisco ACI

Modern data centers are under unrelenting attack. East-west traffic security breaches are happening every day. According to Cisco, 75 percent of all attacks take only minutes to begin stealing data but take longer to detect.   Once discovered, several weeks may pass before full containment and remediation are achieved. Today’s data centers require a variety of “tools” to deal with sophisticated attack vectors. Network segmentation is a proven tool deployed in data centers.

While the broad constructs of segmentation are relevant, today’s application and security requirements mandate increasingly granular methods that are more secure and operationally simpler. This has led to the evolution of “microsegmentation” to address the following:

  • Programmatically define segments on an increasingly granular basis allowing greater flexibility using attributes
  • Automatically program segment and policy management across the entire application lifecycle (deployment to de-commissioning)
  • Quarantine compromised endpoints and limit lateral propagation of threats
  • Enhance security and scale by enabling a Zero-Trust approach for physical, virtual and container workloads.

Cisco’s Application Centric Infrastructure (ACI) takes a very elegant approach to microsegmentation with policy definition separating segments from the broadcast domain.

Figure 1useg image

Read More »

Tags: ,

Part 2: Ten Learnings and Observations from the 2015 London Gartner Data Center Conference

Last week I attended the 2015 London Gartner Data Center conference.

Shadow IT - Addressing the Challenges with the Cisco Cloud Consumption Services

Shadow IT – Addressing the Challenges with the Cisco Cloud Consumption Services

In my first blog (part 1) on this event,  I covered some of my main learnings and observations, #1 .. #5:

  • Bi-modal IT,
  • Anti-fragility,
  • Shadow IT (and how Cisco Cloud Consumption Services can help you here, SDN, and
  • Software asset management,

Let’s now go on and discuss #6 … #10 … on topics from buzzwords, to SDx, and on to Scotch Whisky! Read More »

Tags: , , , , , , , , , , ,