Cisco Logo


Data Center and Cloud

Part 1

If you have been a regular reader of just about any technology blog or publication over the last year you’d be hard-pressed to have not heard about big data and especially the excitement (some might argue hype) surrounding Hadoop.  Big data is becoming big business, and the buzz around it is building commensurately.  What began as a specialized solution to a unique problem faced by the largest of Web 2.0 search engines and social media outlets – namely the need to ingest, store and analyze vast amounts of semi- or unstructured data in a fast, efficient, cost-effective and reliable manner that challenges traditional relational database management and storage approaches – has expanded in scope across nearly every industry vertical and trickled out into a wide variety of IT shops, from small technology startups to large enterprises.  Big business has taken note, and major industry players such as IBM, Oracle, EMC, and Cisco have all begun investing directly in this space.  But why has Hadoop itself proved so popular, and how has it solved some of the limitations of traditional structured relational database management systems (RDBMS) and associated SAN/NAS storage designs?

In the Part 1 of this blog I’ll start by taking a closer look at some of those problems, and tomorrow in Part 2 I’ll show how Hadoop addresses them.

Businesses of all shapes and sizes are asking complex questions of their data to gain a competitive advantage: retail companies want to be able to track changes in brand sentiment from online sources like Facebook and Twitter and react to them rapidly; financial services firms want to scour large swaths of transaction data to detect fraud patterns; power companies ingest terabytes of data from millions of smart meters generating data every hour in hopes of uncovering new efficiencies in billing and delivery.  As a result, developers and data analysts are demanding fast access to as large and “pure” a data set as possible, taxing the limits of traditional software and infrastructure and exposing the following technology challenges:

Acknowledging these difficulties with the traditional RDMBS+SAN/NAS model, a new breed of applications and underlying data management frameworks have emerged over the last decade intending to handle the needs of big data sets in a cost-effective and timely manner.  Hadoop has become one of the most popular choices for big data problems, as it was purpose-built to address these shortcomings.  In Part 2 of this post, I’ll take a closer look at how Hadoop works in this context.

Update: Cisco’s own Jacob Rapp will be speaking at HadoopWorld in NYC next week.  See you there!  http://www.hadoopworld.com/session/hadoop-networkcompute-architecture-considerations/

In an effort to keep conversations fresh, Cisco Blogs closes comments after 90 days. Please visit the Cisco Blogs hub page for the latest content.

9 Comments.


  1. At what limit RDBMS will not be efficient? Does Hadoop have application in telecom field especially 3G and 4G, from the operator point of view and not from end user point of view?

    Best regards,
    Jihad Daouk

       0 likes

    • Hello Jihad-

      There’s no specific limit at which an RDBMS becomes inefficient, as with most technology decisions the answer is “it depends”. Hadoop and MapReduce are another tool in the data toolkit alongside the venerable RDBMS, each with its own strengths. Companies should be aware of the new options Hadoop provides and select the right tool for the job. I’m sure there are plenty of telecom use cases for Hadoop, but I’m not tied into that industry well enough to provide specifics.

      -Sean

         0 likes

  2. Sean
    Excellent post and agree on all count. Do you think traditional RDBMS and SAN/NAS models will eventually have big data system integration by default and perhaps you will see a converged data set systems?
    Thanks

       0 likes

    • Thanks Kapil. There are already examples of vendors providing MapReduce-like capabilities on RDBMS’s – I remember seeing least one or two sessions at Oracle OpenWorld on this subject (MapReduce in PL/SQL, etc.). Whether that makes sense to do or not really depends on the data and the application. MapReduce itself isn’t that radically new of an idea in computer science, but it’s application in Hadoop on top of HDFS with distributed computing designed in from the ground up is innovative. So yes I expect we’ll see some level of convergence, and even some converged systems and appliances trying to provide the best of both worlds, and they’ll probably make sense for some customers in the same way it may make sense to run Hadoop virtualized or even with non-local storage for a given deployment, even if that’s not the most optimal design from a performance standpoint on paper.

         0 likes

  3. Given the emergence of distributed storage technologies such as distributed database and filesystem, and distributed computing technologies such as hadoop implementation. Do you think Cisco data center and storage products need to support them at the infrastructural (NX-OS/IOS level) or it is already there?

       0 likes

    • Hi Sameer-

      Great question. I think there’s a lot already there – in fact some of the largest Hadoop clusters in the world run on Cisco networks, and I think we have a strong suite of products for building a complete Hadoop infrastructure stack. By the same token I think there’s a lot of opportunity out there in what is still a very young market, and I’m really excited to be in a position to help drive Cisco toward new innovations and solutions (whether in silicon or software or both) that can bring additional value to customers in this space.

      -Sean

         0 likes

  4. I am not sure Hadoop in its current form will play a big role in bigdata analytics. Signs of that are in the way it is being evolved to support NAS/SAN, out of JVM (i.e. no code mobility or scale out) and support for structured data.

    What is really catching on in industry in IMHO is the value of analytics at all levels Invocation (API, Messaging), Datastore (structured and unstructed) and Execution (mostly moving to VM).

       0 likes

    • Thanks Vikas, it’s definitely a fluid space. Though I’m not quite ready to agree that Hadoop in it’s current DAS-oriented form won’t be playing a big role in big data analytics – I think it already is and will continue to do so. Although there are efforts to optimize it for more classic SAN/NAS environments, and possibly even virtualizing it, at some point the value prop of Hadoop starts to get lost by trying to shoehorn it into previous-generation IT models. That’s not to say there’s no value in these efforts, for many customers will want the power of MapReduce running in an operational model they’re already comfortable with, but all else equal they’ll probably end up paying more at scale for an equivalent amount of analytic horsepower.

      Regards,
      Sean

         0 likes

Trackbacks and Pingbacks:

  1. Return to Countries/Regions
  2. Return to Home
  1. All Data Center and Cloud
  2. All Security
  3. Return to Home