As discussed in my previous post, application developers and data analysts are demanding fast access to ever larger data sets so they can not only reduce or even eliminate sampling errors in their queries (query the entire raw data set!), but they can also begin to ask new questions that were either not conceivable or not practical using traditional software and infrastructure. Hadoop emerged in this data arms race as a favored alternative to the RDBMS and SAN/NAS storage model. In this second half of the post, I’ll discuss how Hadoop was specifically designed to address these limitations.
Hadoop’s origins derive from two seminal Google white papers from 2003-4, the first describing the Google Filesystem (GFS) for persistent, massively scalable, reliable storage and the second the MapReduce framework for distributed data processing, both of which Google used to ingest and crunch the vast amounts of web data needed to provide timely and relevant search results. These papers laid the groundwork for Apache Hadoop’s implementation of MapReduce running on top of the Hadoop Filesystem (HDFS). Hadoop gained an early, dedicated following from companies like Yahoo!, Facebook, and Twitter, and has since found its way into enterprises of all types due to its unconventional approach to data and distributed computing. Hadoop tackles the problems discussed in Part 1 in the following ways:
Read More »
Tags: Big Data, Cisco, data center, Hadoop, NoSQL
If you have been a regular reader of just about any technology blog or publication over the last year you’d be hard-pressed to have not heard about big data and especially the excitement (some might argue hype) surrounding Hadoop. Big data is becoming big business, and the buzz around it is building commensurately. What began as a specialized solution to a unique problem faced by the largest of Web 2.0 search engines and social media outlets – namely the need to ingest, store and analyze vast amounts of semi- or unstructured data in a fast, efficient, cost-effective and reliable manner that challenges traditional relational database management and storage approaches – has expanded in scope across nearly every industry vertical and trickled out into a wide variety of IT shops, from small technology startups to large enterprises. Big business has taken note, and major industry players such as IBM, Oracle, EMC, and Cisco have all begun investing directly in this space. But why has Hadoop itself proved so popular, and how has it solved some of the limitations of traditional structured relational database management systems (RDBMS) and associated SAN/NAS storage designs?
In the Part 1 of this blog I’ll start by taking a closer look at some of those problems, and tomorrow in Part 2 I’ll show how Hadoop addresses them.
Businesses of all shapes and sizes are asking complex questions of their data to gain a competitive advantage: retail companies want to be able to track changes in brand sentiment from online sources like Facebook and Twitter and react to them rapidly; financial services firms want to scour large swaths of transaction data to detect fraud patterns; power companies ingest terabytes of data from millions of smart meters generating data every hour in hopes of uncovering new efficiencies in billing and delivery. As a result, developers and data analysts are demanding fast access to as large and “pure” a data set as possible, taxing the limits of traditional software and infrastructure and exposing the following technology challenges:
Read More »
Tags: Big Data, Hadoop, NoSQL
What provisioning the Cloud infrastructure and cooking have in common…
I like to cook. Sometimes, I’ll grab whatever ingredients I have on hand, put them in a Dutch oven, throw in a few spices, and make a delicious casserole that can never be repeated. At other times, I’ll follow a recipe to the letter, measure and weigh everything that goes in, and produce a great meal that I can repeat consistently each time.
When provisioning servers and blades for a Cloud infrastructure, the same 2 choices exist: follow your instinct and build a working (but not repeatable) system, or follow a recipe that will ensure that systems are built in an exacting fashion, every time. Without a doubt, the latter method is the only way to proceed.
Enter the Cisco Tidal Server Provisioner (an OEM from www.linmin.com) , an integral component of Cisco Intelligent Automation for Cloud and Cisco Intelligent Automation for Compute. TSP lets you easily create “recipes” that can be easily deployed onto physical systems and virtual machines with repeatability and quality, every time. These recipes can range from simple, e.g., install a hypervisor or an operating system, to very complex: install an operating system, then install applications, run startup scripts, configure the system, access remote data, register services, etc.
Once you have a recipe (we call it a Provisioning Template), you can apply it to any number of physical systems or virtual machines without having to change the recipe. Some data centers use virtualization for sand box development and prototyping, and use physical servers and blades for production. Some data centers do the opposite: prototype on physical systems, then run the production environment in a virtualized environment. And of course, some shops are “all physical” or “all virtual”. Being able to deploy a recipe-based payload consistently on both physical and virtual systems provides the ultimate flexibility. Yes, once you’ve created a virtual machine, you’ll likely use VMware vSphere services to deploy, clone and move VMs, but as long as you’re using TSP to create that “first VM”, you have the assurance that you have a known-good, repeatable way of generating the golden image. When time comes to update the golden image, don’t touch the VM: instead, change the recipe, provision a new VM, and proceed from there.
Read More »
Tags: Cloud Computing, data center provisioning, disk imaging, intelligent automation, job scheduling, linmin, orchestration, self-service, server provisioning
Captain’s log, October 27, 2011:
The USS Cisco took off for the Gestalt IT Networking Tech Field Day 2 with Captain Omar Sultan (see picture below, courtesy of techfieldday.com), Data Center Solutions Sr. Marketing Manager, at the helm. Tech Field Day networking industry experts gathered on the bridge, cleverly disguised as the Cisco Cloud Innovation Center (CICC) Lab, for an informal, no-holds-barred conversation on recent Nexus portfolio announcements, the continued march towards automated provisioning of cloud services and ever-evolving VM networking technologies.
Captain Omar at Cisco Networking Tech Field Day 2
For those who weren’t at the event or haven’t seen the video recording yet, please excuse my unabashed geekiness, but you’ll have to watch the first minute of the video to get the above reference. As a new member of the Data Center Solutions Marketing team, this is also my first foray into the Cisco blog-o-sphere, so I hope to share some fresh viewpoints on the day’s events.
Several things were made very apparent during the Tech Field Day session:
Read More »
Tags: automated provisioning, brighttalk, CIAC, cicc, Cisco Intelligent Automation for Cloud, cloud, han yang, networking tech field day, Nexus 1000v, Nexus 3000, Nexus 5000, Nexus 7000, omar sultan, orchestration, tech field day, Tina Feng, virtual machine networking, virtual services, virtualization insights, vm networking, VXLAN
One size does not fit all. Our customers’ needs differ greatly and they need solutions to address their unique business model. A good example of this is what I experienced recently. While I was at the coffee shop the other day, the owner came up to me and asked me a question about the network he uses to provide his customers with free Wi-Fi and keep his cash register and inventory systems on line. While talking with him about the needs of his particular local coffee shop, I was again struck by how different his needs were compared to that of a large national chain shop or other businesses. In I.T., we tend to lump customers into big buckets. Coffee shops, general retail chains, health care, manufacturing…the thought is that each bucket has enough similarities to allow for a generic solution for each vertical market. It’s a reasonable thought but it tends to break down when we get into the specific business needs of each company.
Each company has its varying operating models. The difference in their operating models allows them to create competitive differentiation. That differentiation can be anything from lower prices to more generous service offerings to a greater variety of products. At Cisco, we recognize that the way our customers do business is a prime enabler to their success. This is more than the operational end of the business; it goes all the way into the data center and the data center network.
Right now, there is a plethora of new ideas for the data center network, each one with it’s own merits and demerits. One is the idea of the flat network. Flat networks eliminate the layers of networking found in a traditional data center environment. Additional layers of networking add to costs, cabling and the overall complexity of the network. Traditional multi-layer networks also are not particularly suited to facilitating a virtualized server environment where east-west traffic and workload moves are prevalent. Multi-tier networks tend to also be less adaptable for converged networking, where storage and general data traffic all flow on Ethernet with protocols like iSCSI and FCoE. Flat networking may solve many of these problems while lowering costs.
Read More »