By now it is clear that big data analytics opens the door to unprecedented analytic opportunities for business innovation, customer retention and profit growth. However, a shortage of data scientists is creating a bottleneck as organizations move from early big data experiments into larger scale adoption. This constraint limits big data analytics and the positive business outcomes that could be achieved.
Click on the photo to hear from Comcast’s Jason Hull, Data Integration Specialist about how his team uses data virtualization to get what they need done, faster
It’s All About the Data
As every data scientist will tell you, the key to analytics is data. The more data the better, including big data as well as the myriad other data sources both in the enterprise and across the cloud. But accessing and massaging this data, in advance of data modeling and statistical analysis, typically consumes 50% or more of any new analytic development effort.
• What would happen if we could simplify the data aspect of the work?
• Would that free up data scientists to spend more time on analysis?
• Would it open the door for non-data scientists to contribute to analytic projects?
SQL is the key. Because of its ease and power, it has been the predominant method for accessing and massaging data for the past 30 years. Nearly all non-data scientists in IT can use SQL to access and massage data, but very few know MapReduce, the traditional language used to access data from Hadoop sources.
How Data Virtualization Helps
“We have a multitude of users…from BI to operational reporting, they are constantly coming to us requesting access to one server or another…we now have that one central place to say ‘you already have access to it’ and they immediately have access rather than having to grant access outside of the tool” -Jason Hull, Comcast
Data virtualization offerings, like Cisco’s, can help organizations bridge this gap and accelerate their big data analytics efforts. Cisco was the first data virtualization vendor to support Hadoop integration with its June 2011 release. This standardized SQL approach augments specialized MapReduce coding of Hadoop queries. By simplifying access to Hadoop data, organizations could for the first time use SQL to include big data sources, as well as enterprise, cloud and other data sources, in their analytics.
In February 2012, Cisco became the first data virtualization vendor to enable MapReduce programs to easily query virtualized data sources, on-demand with high performance. This allowed enterprises to extend MapReduce analyses beyond Hadoop stores to include diverse enterprise data previously integrated by the Cisco Information Server.
In 2013, Cisco maintained its big data integration leadership with updates of its support for Hive access to the leading Hadoop distributions including Apache Hadoop, Cloudera Distribution (CDH) and Hortonworks (HDP). In addition, Cisco now also supports access to Hadoop through HiveServer2 and Cloudera CDH through Impala.
Recently I had an opportunity to sit down with the talented data scientists from Cisco’s Threat Research, Analysis, and Communications (TRAC) team to discuss Big Data security challenges, tools and methodologies. The following is part one of five in this series where Jisheng Wang, John Conley, and Preetham Raghunanda share how TRAC is tackling Big Data.
Given the hype surrounding “Big Data,” what does that term actually mean?
John: First of all, because of overuse, the “Big Data” term has become almost meaningless. For us and for SIO (Security Intelligence and Operations) it means a combination of infrastructure, tools, and data sources all coming together to make it possible to have unified repositories of data that can address problems that we never thought we could solve before. It really means taking advantage of new technologies, tools, and new ways of thinking about problems.
Cisco is proud to be a Platinum sponsor and exhibitor at PASS Summit this year. If you aren’t familiar with PASS Summit, it “is the world’s largest, most-focused, and most-intensive conference for Microsoft SQL Server and BI professionals.”
Gary Serda has done an excellent job in detailing what the Cisco UCS team will be sharing with attendees in his blog post Guide to Cisco at the PASS Summit, so I wanted to highlight our 3D, interactive vRack of our Unified Computing System which is always a highlight at trade shows and will be on display at PASS Summit.
Both the Nexus 1000V and FlexPod won Best of TechEd 2013 awards. This was the third year in a row for a Cisco product to be so honored.
We’re looking forward to seeing you at WPC. Join the conversation on social media using the hashtag #CiscoWPC. If you won’t be able to join us and would like to learn more about how Cisco is changing the economics of the datacenter, I would encourage you to review this presentation on SlideShare or my previous series of blog posts, Yes, Cisco UCS servers are that good. Or visit the Microsoft Cisco UCS portal.
Source: IDC Worldwide Quarterly Server Tracker, Q1 2013 Revenue Share, May 2013
I have lost count of the number of trade shows I’ve worked over my career. But working trade shows for Cisco over the past 14 months has been a uniquely positive experience. Microsoft TechEd North America 2013 makes my 5th show evangelizing Cisco UCS and our solutions.
I have been able to have long (sometimes up to 45 minutes) conversations with potential customers who have heard about UCS and want to learn more. Their reactions on how Cisco does it differently from others in the industry is an eye-opener for them – whether it the technology or the economics of the solution. They all walk away saying they are going to have to dig deeper into our solutions and contact their account team / partner.
It has become almost embarrassing the amount of praise our current customers heap on us when they come by the booth. Embarrassing because I’m just a very small part of what makes UCS successful; Cisco has a very strong team behind UCS and I wish they all could hear the great things the customers are saying about their experiences.
There are still two days left to stop by the Cisco booth and learn about:
UCS Solutions FlexPod, VSPEX, Exchange, SQL Server