Undoubtedly Big Data is becoming an integral part of enterprise IT ecosystem across major industry verticals, and Apache Hadoop is emerging almost synonymous with it as the as the foundation of the next generation data management platform. Sometimes referred to as Data Lake this platform serves as the primary landing zone for data from across a wide variety of data sources. Traditional and several new application software vendors have been building the plumbing -- in software terms data connectors and data movers -- to extract data from it for further processing. New to Apache Hadoop is YARN which is pretty much an operating system for Big Data enabling multiple workloads -- batch, interactive, streaming, and real-time -- all coexisting on a cluster.
The Hortonworks Data Platform combines the most useful and stable versions of Apache Hadoop and its related projects into a single tested and certified package. Cisco has been partnering with HortonWorks to provide an industry leading platform for enterprise Hadoop deployments. The Cisco UCS solution for Hortonworks Data Platform is based on the Cisco UCS Common Platform Architecture Version 2 for Big Data – a popular platform for Data Lakes widely adopted across major industry verticals, featuring single connect, unified management, advanced monitoring capabilities, seamless management integration and data integration (plumbing) capabilities with other enterprise application systems based on Oracle, Microsoft, SAS, SAP and others.
We are excited to see several joint wins with Hortonworks in the service provider, insurance, retail, healthcare and other sectors. The joint solution is available in three reference architectures, Performance-Capacity Balanced, Capacity Optimized and Capacity Optimized with Flash – all support up to 10 racks at 16 servers each without additional switches. Scaling beyond 10 racks (160 servers) can be implemented by interconnecting domains using Cisco Nexus 6000/7000/9000 series switches, scalable to thousands of servers and to hundreds of petabytes storage, and managed from a single pane using the Cisco UCS Central.
New to this partnership is Hortonworks Data Platform 2.1 which includes Apache Hive 13 which significantly faster than previous generation Hive 12. We have jointly conducted extensive performance benchmarking using 20 queries derived from TPC-DS Benchmark – an industry standard benchmark for Decision Support Systems from the Transaction Processing Performance Council (TPC). The tests were conducted on a 16 node Cisco UCS CPA v2 Performance-Capacity Balanced cluster using a 30TB dataset. We have observed about 300% performance acceleration for some queries with Hive 13 compared to Hive 12. See Figure 1.
Additional performance are improvements expected with the GA release. What does this mean? (i) First of all, Hive brings SQL like abilities – SQL being the most common and expressive language for analytics -- to petabyte scale datasets – in an economical manner (ii) Hadoop becomes friendlier for SQL developers and SQL based business analytics platforms (iii) Such performance improvements (from Hive 12 to 13) makes migrations from proprietary systems to Hadoop even more compelling. More coming. Stay tuned !
Figure 1:Hive 13 vs. Hive 12
Disclaimer: The queries listed here is derived from the TPC-DS Benchmark. These results cannot be compared with TPC-DS Benchmark results. For more information visit www.tpc.org.
Tags: Big Data, Cisco UCS CPA, Hadoop
Built upon our vision of shared infrastructure and unified management, the Cisco UCS Common Platform Architecture (CPA) for Big Data has become a leading platform for Big Data deployments. Today we are announcing support for Cloudera Enterprise 5 – an industry leading data management platform that combines Apache Hadoop with a number of other open source projects all integrated in to a single enterprise ready platform. The joint solution is tested and certified by Cisco and Cloudera to accelerate enterprise Hadoop deployments while significantly reducing the risks, complexity, and total cost of ownership.
With Hadoop at its core, Cloudera Enterprise enables an enterprise data hub by making it economically viable and technically feasible for enterprises to keep all their data in a single, centralized platform, from which they can store, process and analyze data in full fidelity, for a variety of enterprise workloads. Cloudera Enterprise 5 delivers tight integration with existing enterprise data management systems including key attributes to deliver robust security, governance, and data protection and management that enterprises require.
The Cisco and Cloudera joint solution is available in two reference architectures, Performance-Capacity Balanced and Capacity Capacity Optimized, both support up to 10 racks at 16 servers each without additional switches. The Performance-Capacity Balanced configuration provides an excellent balance of computing power and storage capacity supporting 32GBps of I/O bandwidth and 384TB storage per rack. The Capacity Optimized configuration provides a high storage density for storage-intensive deployments supporting 16GBps of I/O bandwidth and 768TB storage per rack for a total of 7.68PB when scaled to a 10 rack configuration.
Scaling beyond 10 racks (160 servers) can be implemented by interconnecting multiple UCS domains using Nexus 7000/9000 Series switches, scalable to thousands of servers and to hundreds of petabytes storage, and managed from a single pane using UCS Central in a datacenter or distributed globally.
The base rack configuration is available through Cisco UCS Solution Accelerator Paks for Big Data program, designed for: ease of ordering, rapid deployments, tested and validated for performance, and optimized for cost of ownership. Performance and Capacity Balanced rack SKU: UCS-SL-CPA2-PC and Capacity Optimized rack SKU: UCS-SL-CPA2-C.
Big Data Design Zone
Cisco Validated Design: Cisco UCS CPA for Big Data with Cloudera
Tags: Big Data, Cisco UCS CPA, Cisco Validated Design, Cloudera, Cloudera Enterprise 5
Industry’s first reference architecture for Hadoop with advanced access control and encryption with IDH, first flash-enhanced reference architecture for Hadoop demonstrated using YCSB with MapR, industry’s first validated and certified solution for real-time Big Data analytics with SAP HANA, and Unleashing IT big data special edition
Built up on our vision of shared infrastructure and unified management for enterprise applications, the Cisco UCS Common Platform Architecture (CPA) for Big Data has become a popular choice for enterprise Big Data deployments. It has been widely adopted in finance, healthcare, service provider, entertainment, insurance, and public sectors. The new Cisco UCS CPA V2 improves both performance and capacity featuring Intel Xeon E5-2600 v2 family of processors, industry leading storage density, and industry’s first transparent cache acceleration for Big Data.
The Cisco UCS CPA v2 offers a choice of infrastructure options, including “Performance Optimized”, “Balanced”, “Capacity Optimized”, and “Capacity Optimized with Flash” to support a range of workload needs.
Up to 160 servers (3200 cores, 7.6PB storage) are supported in single switching/UCS domain. Scaling beyond 160 servers can be implemented by interconnecting multiple UCS domains using Nexus 6000/7000 Series switches, scalable to thousands of servers and to hundreds of petabytes storage, and managed from a single pane using UCS Central in a data center or distributed globally.
The Cisco UCS CPA v2 solutions are available through Cisco UCS Solution Accelerator Paks program designed for rapid deployments, tested and validated for performance, and optimized for cost of ownership: Performance Optimized half-rack (UCS-SL-CPA2-P) ideal for MPP databases and scale-out data analytics, Performance and Capacity Balanced rack (UCS-SL-CPA2-PC) ideal for high performance Hadooop and NoSQL deployments, Capacity Optimized rack (UCS-SL-CPA2-C) when capacity matters, and Capacity Optimized with Flash rack (UCS-SL-CPA2-CF) offers industry’s first transparent caching option for Hadoop and NoSQL. Start with any configuration and scale as your workload demands.
Cisco supports leading Hadoop and NoSQL distributions, including Cloudera, HortonWorks, Intel, MapR, Oracle, Pivotal and others. For more information visit Cisco Big Data Portal, and Big Data Design Zone that offers Cisco Validated Designs (CVD) -- pretested and validated architectures that accelerate the time to value for customers while reducing risks and deployment challenges.
Cisco UCS Common Platform Architecture Version 2 for Big Data
Cisco Launches the First Flash-Enhanced Solution for Hadoop
Simplifying the Deployment of Real-time Big Data Analytics — UCS + SAP HANA
Also see Maximizing Big Data Benefits with MapR and Informatica on Cisco UCS
Tags: Cisco UCS CPA, Cisco UCS Solution Accelerator Paks, Cloudera, Hortonworks, Intel Hadoop, MapR, Pivotal HD, SAP. HANA
Extending our vision of shared infrastructure and unified management for enterprise applications, we are announcing industry’s first validated and certified solution for real-time Big Data analytics with SAP HANA. Based on our joint work with Intel and SAP at the SAP Co-innovation Lab (COIL), the solution integrates SAP HANA with Intel Distribution for Apache Hadoop running on Cisco UCS Common Platform Architecture (CPA) for Big Data, enabling real-time analysis of Big Data, while radically simplifying the infrastructure and management.
Solution Brief: Simplifying the Deployment of Real-time Big Data Analytics -- UCS + SAP HANA
Blog: Building on Success: Cisco and Intel Expand Partnership to Big Data
White Paper: Cisco UCS with the Intel Distribution for Apache Hadoop Software
Tags: Cisco UCS CPA, HANA, intel-distribution-for-apache-hadoop
While there is not yet an industry standard benchmark for measuring the performance of Hadoop systems (yes, there is work in progress -- WBDB, BigDataTop100 etc), workloads like TeraSort have become a popular choice to benchmark and stress test Hadoop clusters.
TeraSort is very simple, consists of three map/reduce programs (i) TeraGen -- generates the dataset (ii) TeraSort -- samples and sort the dataset (iii) TeraValidate -- validates the output. With multiple vendors now publishing TeraSort results, organizations can make reasonable performance comparisons while evaluating Hadoop clusters.
We conducted a series of TeraSort tests on our popular Cisco UCS Common Platform Architecture (CPA) for Big Data rack with 16 Cisco UCS C240 M3 Rack Servers equipped with two Intel Xeon E5-2665 processors, running Apache Hadoop distribution, see figure below, demonstrating industry leading performance and scalability over a range of data set sizes from 100GB to 50TB. For example, out of the box, our 10TB result is 40 percent faster than HP’s published result on 18 HP ProLiant DL380 Servers equipped with two Intel Xeon E5-2667 processors.
While Hadoop offers many advantages for organizations, the Cisco story isn’t complete without including collaborations with our ecosystem partners that enables us to offer complete solution stacks. We support leading Hadoop distributions including Cloudera, HortonWorks, Intel, MapR, and Pivotal on our Cisco UCS Common Platform Architecture (CPA) for Big Data. We just announced our Big Data Design Zone that offers Cisco Validated Designs (CVD) -- pretested and validated architectures that accelerate the time to value for customers while reducing risks and deployment challenges.
Cisco Big Data Design Zone
Cisco UCS Demonstrates Leading TeraSort Benchmark Performance
Cisco UCS Common Platform Architecture (CPA) for Big Data
Tags: Big Data, Big Data Benchmarks, Cisco UCS C240 M3 Rack Server, Cisco UCS CPA, CPA, Hadoop, TeraSort, YCSB