Big Data is one of the most talked about topics of today across industry, government and research. It is becoming the center of Investments, Innovations and Improvizations (3I’s), and no exaggeration to say that Big Data is Transforming the World. Considering it’s potential the IEEE Computer Society is conducting the IEEE International Conference on Big Data 2013, a premier forum to disseminate and exchange the latest and greatest in Big Data. The main theme of the conference will be the 5V’s: Volume, Velocity, Variety, Value and Veracity aspects. The conference will take place in Santa Clara, CA from October 6th to 9th. I have the great privilege to co-chair the Industry and Government Program with my distinguished colleagues: Rayid Ghani (Obama Campaign), Wei Han (Noah’s Ark Lab) and Ronny Lempel (Yahoo! Labs) along with Xiaohua Tony Hu (Drexel University) who is chairing the Steering Committee. The 4-day program includes about 50 presentations selected from over 300 paper submissions from more than 1000 authors from 40 countries, four keynotes (Amr Awadallah, Mike Franklin, Hector Garcia-Molina and Roger Schell), 12 workshops, and two tutorials. I have the great pleasure to deliver the opening and welcoming speech on behalf of the industry and government committee. I am also chairing Amr Awadallah’s keynote session on Key Usage Patterns for Apache Hadoop in the Enterprise and co-presenting a paper titled A Look at Challenges and Opportunities of Big Data Analytics in Healthcare at the workshop on Big Data in Bioinformatics and Healthcare Informatics. This workshop will be very interesting with sessions like Big Data Solutions for Predicting Risk‐of‐Readmission for Congestive Heart Failure Patients, Colon cancer survival prediction using ensemble data mining on SEER Data etc.
Cisco is a proud sponsor of the conference. Additional Information:
Tags: 3Is, 5Vs, Big Data, bigdatatop100, Hadoop, IEEE, TPCTC, WBDB
While there is not yet an industry standard benchmark for measuring the performance of Hadoop systems (yes, there is work in progress -- WBDB, BigDataTop100 etc), workloads like TeraSort have become a popular choice to benchmark and stress test Hadoop clusters.
TeraSort is very simple, consists of three map/reduce programs (i) TeraGen -- generates the dataset (ii) TeraSort -- samples and sort the dataset (iii) TeraValidate -- validates the output. With multiple vendors now publishing TeraSort results, organizations can make reasonable performance comparisons while evaluating Hadoop clusters.
We conducted a series of TeraSort tests on our popular Cisco UCS Common Platform Architecture (CPA) for Big Data rack with 16 Cisco UCS C240 M3 Rack Servers equipped with two Intel Xeon E5-2665 processors, running Apache Hadoop distribution, see figure below, demonstrating industry leading performance and scalability over a range of data set sizes from 100GB to 50TB. For example, out of the box, our 10TB result is 40 percent faster than HP’s published result on 18 HP ProLiant DL380 Servers equipped with two Intel Xeon E5-2667 processors.
While Hadoop offers many advantages for organizations, the Cisco story isn’t complete without including collaborations with our ecosystem partners that enables us to offer complete solution stacks. We support leading Hadoop distributions including Cloudera, HortonWorks, Intel, MapR, and Pivotal on our Cisco UCS Common Platform Architecture (CPA) for Big Data. We just announced our Big Data Design Zone that offers Cisco Validated Designs (CVD) -- pretested and validated architectures that accelerate the time to value for customers while reducing risks and deployment challenges.
Cisco Big Data Design Zone
Cisco UCS Demonstrates Leading TeraSort Benchmark Performance
Cisco UCS Common Platform Architecture (CPA) for Big Data
Tags: Big Data, Big Data Benchmarks, Cisco UCS C240 M3 Rack Server, Cisco UCS CPA, CPA, Hadoop, TeraSort, YCSB
Cisco UCS Common Platform Architecture (CPA) for Big Data offers a comprehensive stack for enterprise Hadoop deployments. Today we announce the availability of Cisco Validated Design (CVD) for Cloudera (CDH) that describes the architecture and deployment procedures, jointly tested and certified by Cisco and Cloudera to accelerate deployments while reducing the risks, complexity, and total cost of ownership.
Together, Cisco and Cloudera are well positioned to help organizations exploit the valuable business insights found in all their data, regardless of whether it’s structured, semi structured or unstructured. The solution offers industry-leading performance, scalability and advanced management capabilities to address the business needs of our customers.
The rack level configuration detailed in the document can be extended to multiple rack scale. Up to 160 servers (10 racks) can be supported with no additional switching in a single UCS domain. Scaling beyond 10 racks can be implemented by interconnecting multiple UCS domains using Nexus 6000/7000 Series switches, scalable to thousands of servers and to hundreds of petabytes storage, and managed from a single pane using UCS Central.
We would like to invite you to our upcoming Journey to Big Data Roadshow in a city near you, designed to help you identify where you are on your Big Data journey, and how to keep that journey going in a low-risk, productive way.
1. Cisco UCS CPA for Big Data with Cloudera
2. Flexpod Select for Hadoop with Cloudera
3. Cloudera Enterprise with Cisco Unified Computing System (solution brief)
Tags: Cisco UCS CPA, Cloudera, CPA, Hadoop, Journey to Big Data
Cloudera Sessions is coming to a City Near You!
Have you registered for the upcoming Cloudera Sessions roadshow yet? According to IDC Analysts, the market for Big Data will reach $16.9 billion by 2015, with an outrageous 40% CAGR. As the sheer volume of data continues to climb, enterprise customers will need the right software and infrastructure to transform this data into meaningful insights.
Cisco is partnering with Cloudera to offer a comprehensive infrastructure and management solution, based on the Cisco Unified Computing System (UCS), to support our customers big data initiatives. As a proud sponsor for this event, I would encourage you to join us at one of the following scheduled stops to learn more about our joint solutions for big data:
San Francisco 9/11
Jersey City 9/18
Milwaukee 10/17 (Note: changed from 10/2 to 10/17)
Cloudera has a fantastic agenda scheduled in each of the cities featuring keynote speakers that you won’t want to miss. I hope to see you there.
For the latest information regarding Big Data on Cisco UCS, I’ve added a couple links below for your reference:
Introducing Cisco UCS Common Platform Architecture (CPA) for Big Data, By Raghu Nambiar
Top Three Reasons Why Cisco UCS is a Better Platform for Big Data, by Scott Ciccone
Stop by and say hello and let me know if you have any comments or questions, or via twitter at @CicconeScott.
Tags: Big Data, Blade Servers, Cisco UCS, Cisco Unified Computing System, Hadoop, Rack Servers, UCS
Speed is everything. Continuing our commitment to make data center infrastructures more responsive to enterprise applications demands, today, we announced FlexPod Select with Hadoop, formerly known as NetApp Open Solution for Hadoop, broadening our FlexPod portfolio. Developed in collaboration between Cisco and NetApp, offers an enterprise-class infrastructure that accelerates time to value from your data. This solution is pre-validated for Hadoop deployments built using Cisco 6200 Series Fabric Interconnects (connectivity and management), C220 M3 Servers (compute), NetApp FAS2220 (namenode metadata storage) and NetApp E5400 series storage arrays (data storage). Following the highly successful FlexPod model of pre-sized rack level configurations, this solution will be made available through the well-established FlexPod sales engagement and channel.
The FlexPod Select with Hadoop architecture is an extension of our popular Cisco UCS Common Platform Architecture (CPA) for Big Data designed for applications requiring enterprise class external storage array features like RAID protection with data replication, hot-swappable spares, proactive drive health monitoring, faster recovery from disk failures and automated I/O path fail-over. The architecture consists of a master rack and optionally up to nine expansion racks in a single management domain, creating a complete, self-contained Hadoop cluster. The master rack provides all of the components required to run a 12 node Hadoop cluster supporting 540TB storage capacity. Each additional expansion rack provides an additional 16 Hadoop cluster nodes and 720TB storage capacity. Unique to this architecture is seamless management integration and data integration capabilities with existing FlexPod deployments that can help to significantly lower the infrastructure and management costs.
FlexPod Select has been pretested and jointly validated with leading Hadoop vendors, including Cloudera and Hortonworks.
Tags: Big Data, Cloudera, CPA, FlexPod, FlexPod Select, Hadoop, Hortonworks, netapp