Last week we participated in the annual Hadoop Summit held in San Jose, CA. When we first met with Hortonworks about the Summit many months back they mentioned this year’s Hadoop Summit would be promoting Reference Architectures from many companies in the Hadoop Ecosystem. This was great to hear as we had previously presented results from a large round of testing on Network and Compute Considerations for Hadoop at Hadoop World 2011 last November and we were looking to do a second round of testing to take our original findings and test/develop a set of best practices around them including failure and connectivity options. Further the set of validation demystifies the one key Enterprise ask “Can we use the same architecture/component for Hadoop deployments?”. Since a lot of the value of Hadoop is seen once it is integrated into current enterprise data models the goal of the testing was to not only define a reference architecture, but to define a set of best practices so Hadoop can be integrated into current enterprise architectures.
Below are the results of this new testing effort presented at Hadoop Summit, 2012. Thanks to Hortonworks for their collaboration throughout the testing.
Expanding its Big Data portfolio, Cisco announced a fully integrated end-to-end hardware and software infrastructure for enterprise Hadoop deployments in partnership with Greenplum, a division of EMC, that delivers industry-leading performance, scalability, advanced management capabilities and enterprise-class service and support. This solution consists of Cisco UCS 6200 Series Fabric Interconnects, Cisco UCS C-Series rack mount servers and Greenplum MR. Greeplum MR is based on the MapR M5 distribution, a completely re-engineered implementation of the Apache Hadoop stack with 100 percent compatibility. Cisco UCS is the exclusive integrated platform for Greeplum MR that can significantly reduce time-to-value and the operating expenses associated with Hadoop implementations.
Hadoop implementations can present a number of challenges to enterprise environments, many of these arise from the dichotomy between the introduction of innovative new technology and the enterprise-class performance, reliability, and support demanded by mission-critical systems. The collaboration between Cisco and Greenplum is specifically designed to provide a solution to these challenges. The joint solution delivers radically simplified deployment and management, high availability, excellent performance, exceptional scalability, and world-class service and support from long-time collaborators Cisco and EMC.
This solution can also connect, across the same management plane, to other Cisco UCS deployments running enterprise applications, thereby radically simplifying data center management and connectivity.
The configuration starts in a single rack with the ability to extend into multiple racks.
For more information or deal inquiries, please email us at: firstname.lastname@example.org. A joint white paper is available at http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns944/wp_greenplum.pdf.
Next week is Cloud Connect in Santa Clara and Cisco’s Cloud Software group will have a big presence.
While we have plenty to talk about on how Cisco is helping customers build their cloud, we also want to listen to our customers plans and needs. We are bringing some of our engineers and architects so you can engage directly with them. There are three things you can see next week.
CITEIS -- Cisco’s, in production, private cloud.
See how it was built, the results in agility and cost, and best of all see a demo. Not a fake demo but the real thing.
Of course, we will also be showcasing our award winning cloud automation software, Cisco Intelligent Automation for Cloud (CIAC) (formerly newScale and Tidal), which provides the self-service catalog and orchestration to our private cloud
Big Data’s move into the enterprise has generated a lot of buzz on why big data, what are the components and how to integrate? The “why” was covered in a two part blog (Part 1 | Part 2) by Sean McKeown last week. To help answer the remaining questions, I presented Hadoop Network and Architecture Considerations last week at the sold out Hadoop World event in New York. The goal was to examine what considerations need to be taken to integrate Hadoop into Enterprise architectures by demystifying what happens on the network and identifying key network characteristics that affect Hadoop clusters.
The presentation includes results from an in depth testing effort to examine what Hadoop means to the network. We went through many rounds of testing that spanned several months (special thanks to Cloudera on their guidance). Read More »
As discussed in my previous post, application developers and data analysts are demanding fast access to ever larger data sets so they can not only reduce or even eliminate sampling errors in their queries (query the entire raw data set!), but they can also begin to ask new questions that were either not conceivable or not practical using traditional software and infrastructure. Hadoop emerged in this data arms race as a favored alternative to the RDBMS and SAN/NAS storage model. In this second half of the post, I’ll discuss how Hadoop was specifically designed to address these limitations.
Hadoop’s origins derive from two seminal Google white papers from 2003-4, the first describing the Google Filesystem (GFS) for persistent, massively scalable, reliable storage and the second the MapReduce framework for distributed data processing, both of which Google used to ingest and crunch the vast amounts of web data needed to provide timely and relevant search results. These papers laid the groundwork for Apache Hadoop’s implementation of MapReduce running on top of the Hadoop Filesystem (HDFS). Hadoop gained an early, dedicated following from companies like Yahoo!, Facebook, and Twitter, and has since found its way into enterprises of all types due to its unconventional approach to data and distributed computing. Hadoop tackles the problems discussed in Part 1 in the following ways: