Cisco Logo


Data Center and Cloud

Big Data’s move into the enterprise has generated a lot of buzz on why big data, what are the components and how to integrate? The “why” was covered in a two part blog (Part 1 | Part 2) by Sean McKeown last week. To help answer the remaining questions, I presented Hadoop Network and Architecture Considerations last week at the sold out Hadoop World event in New York. The goal was to examine what considerations need to be taken to integrate Hadoop into Enterprise architectures by demystifying what happens on the network and identifying key network characteristics that affect Hadoop clusters.

Hadoop World
View more presentations from Cisco Data Center

The presentation includes results from an in depth testing effort to examine what Hadoop means to the network. We went through many rounds of testing that spanned several months (special thanks to Cloudera on their guidance).

We started out with our 128 Node UCS C-Series Rack-Mount Server Cluster which we had been running some HPC Benchmarks on (an interesting note given the NextGen Hadoop is going to incorporate MPI workloads…but that’s a completely different conversation…) and beefed up the disk on each server to a total of one Petabyte of raw disk for the cluster. When looking at which benchmarks to run we started out with likely the most network-intensive benchmark, Yahoo Terasort, which¬†starts with the same amount of data that it ends with, it is simply sorting. This means if you start with 10TB as the input, the whole 10TB will be shuffled across the network in the phase between Maps and Reducers along with 20TB at the end, if output replication is enabled. While on the more intensive side, it does simulate a common task for Hadoop clusters, Extract, Transform and Load (ETL). Further we tested other workloads that simulated Business Intelligent (BI) workloads as well as multidimensional testing with multiple different workloads running at the same time while importing data into the cluster.

Throughout the test we monitored traffic usage and patterns, buffer monitoring (a recent feature on the Nexus 2248TP-E and Nexus 3000 Series) and multiple different topologies, 1GE vs. 10GE and various server considerations. The results of the test can be further referenced in a white paper published on www.cisco.com/go/bigdata.

Some key takeaways are:

Through this effort we also completed Cloudera’s Certified Technology program with the UCS C200 M2 and C210 M2 Rack-Mount Servers interconnected through the Nexus 5000 Series and the new 2248TP-E Fabric Extender that is optimized for specialized datacenter workloads, such as Hadoop. The testing also involved the new Nexus 3048 as another option for those deploying Big Data infrastructures.

In an effort to keep conversations fresh, Cisco Blogs closes comments after 90 days. Please visit the Cisco Blogs hub page for the latest content.

1 Comments.


  1. Jacob,
    Thought your readers would find your video session in theCUBE from Hadoop World interesting too – replay is available here: http://siliconangle.tv/video/cisco-hadoop-world-2011
    Cheers,
    Stu Miniman
    Wikibon.org
    Twitter: @stu

       0 likes

  1. Return to Countries/Regions
  2. Return to Home
  1. All Data Center and Cloud
  2. All Security
  3. Return to Home