By now it is clear that big data analytics opens the door to unprecedented analytic opportunities for business innovation, customer retention and profit growth. However, a shortage of data scientists is creating a bottleneck as organizations move from early big data experiments into larger scale adoption. This constraint limits big data analytics and the positive business outcomes that could be achieved.
Click on the photo to hear from Comcast’s Jason Hull, Data Integration Specialist about how his team uses data virtualization to get what they need done, faster
It’s All About the Data
As every data scientist will tell you, the key to analytics is data. The more data the better, including big data as well as the myriad other data sources both in the enterprise and across the cloud. But accessing and massaging this data, in advance of data modeling and statistical analysis, typically consumes 50% or more of any new analytic development effort.
• What would happen if we could simplify the data aspect of the work?
• Would that free up data scientists to spend more time on analysis?
• Would it open the door for non-data scientists to contribute to analytic projects?
SQL is the key. Because of its ease and power, it has been the predominant method for accessing and massaging data for the past 30 years. Nearly all non-data scientists in IT can use SQL to access and massage data, but very few know MapReduce, the traditional language used to access data from Hadoop sources.
How Data Virtualization Helps
“We have a multitude of users…from BI to operational reporting, they are constantly coming to us requesting access to one server or another…we now have that one central place to say ‘you already have access to it’ and they immediately have access rather than having to grant access outside of the tool” -Jason Hull, Comcast
Data virtualization offerings, like Cisco’s, can help organizations bridge this gap and accelerate their big data analytics efforts. Cisco was the first data virtualization vendor to support Hadoop integration with its June 2011 release. This standardized SQL approach augments specialized MapReduce coding of Hadoop queries. By simplifying access to Hadoop data, organizations could for the first time use SQL to include big data sources, as well as enterprise, cloud and other data sources, in their analytics.
In February 2012, Cisco became the first data virtualization vendor to enable MapReduce programs to easily query virtualized data sources, on-demand with high performance. This allowed enterprises to extend MapReduce analyses beyond Hadoop stores to include diverse enterprise data previously integrated by the Cisco Information Server.
In 2013, Cisco maintained its big data integration leadership with updates of its support for Hive access to the leading Hadoop distributions including Apache Hadoop, Cloudera Distribution (CDH) and Hortonworks (HDP). In addition, Cisco now also supports access to Hadoop through HiveServer2 and Cloudera CDH through Impala.
In this week’s episode of Engineers Unplugged, Floris Grandvarlet (Cisco) and Richard Pilling (Intel) take on Big Data across the proverbial pond, at Cisco Live Milan. Where are we now, how are we going to approach the ever increasing amount of data (an ocean of it) to fish for information? This is a great overview for the challenges and the evolution of approaches.
Let’s watch and see what they propose to address the challenges:
It’s our very first seahorse--outsmarted once more.
**The next Engineers Unplugged shoot is at EMC World, Las Vegas, May 2014! Contact me now to become internet famous.**
Now that OpenDaylight has arrived, it’s time to explain why I made the Open Source choices eventually embraced by its Founders and the community at large. One doesn’t often see such leaders as Cisco, IBM, Intel, HP, Juniper, RedHat, VMWare, NEC, Microsoft and others agree, share and collaborate on such key technologies, let alone the latter engaging in a Linux Foundation based community (some thought hell will freeze over before that would ever happen, though it got pretty cold at times last Spring).
For those of you not familiar with OpenDaylight (see “Meet Me On The Equinox”, not a homage to Death Cab for Cutie or my Transylvanian homeland), IBM and Cisco have actually started this with an amazing set of partners, nearly that ephemeral Equinox this year (~11am, March 20th) though we couldn’t quite brag about it until all our partners saw the daylight, which by now, we’re hoping everyone does. It was hard not to talk about all this as we saw those half baked, speculative stories before the Equinox – amazing how information flew, distorted as it were, but it did; I wish source code would be that “rapid”, we’d all be so much better for it…
The Open Source model for OpenDaylight is simple, it has only two parts: the community is hosted in the Linux Foundation and the license is Eclipse. The details are neatly captured in a white paper we wrote and published in the Linux Foundation. Dan Frye, my friend and fellow counterpart at IBM and I came up with the main points after two short meetings. It would have been one, but when you work for such giants as our parent companies and soon to be OpenDaylight partners, one has to spend a little more time getting everyone to see the daylight. It boils down to two things, which I am convinced are the quintessential elements of any successful open source project.
1) Community. Why? Because it trumps everything: code, money and everything else. A poor community with great code equals failure (plenty of examples of that). A great community with poor (or any) code equals success (plenty of examples of that too). Why? Because open source equals collaboration, of the highest kind: I share with you, and you with me, whatever I have, I contribute my time, my energy, my intellectual property, my reputation, etc.. And ultimately it becomes “ours”. And the next generation’s. Open Source is not a technology; it’s a development model. With more than 10 million open source developers world wide, it happens to be based on collaboration on a scale and diversity that humanity has never experienced before. Just think about what made this possible and the role some of the OpenDaylight partners have already played in it since the dawn of the Internet. Dan Frye and I agreed that the Linux Kernel community is the best in the world and so we picked the closest thing to it to model and support ours, the Linux Foundation.
2) Fragmentation, or anti-fragmentation, actually. Why? The biggest challenge of any open source project is how to avoid fragmentation (the opposite of collaboration). Just ask Andy Rubin and the Android guys what they fear the most. Just ask any open source project’s contributors, copyright holders, or high priests, how much they appreciate an open source parasite that won’t give back. Though we would have liked to go deeper, we settled on Eclipse, largely because of the actual language and technology we dealt with in the OpenDaylight Controller: most, if not all the initial code is Java, and though some are worried about that, I’m sure Jim Gosling is proud (btw, I’m not sure the Controller has to stay that way, I actually agree with Amin Vahdat), but we had to start somewhere. Plus having a more friendly language NB (northbound, as in the applications run on top of the Controller) is such a cool thing, we think that the #1 open source (Eclipse) and the #1 commercial (Microsoft) IDE’s are going to be very good to it, so why not? There are more reasons that pointed in the Eclipse direction, and other reasons for such wonderful alternatives (as APL or MPL, perhaps the subject of another post, some day). But when it comes to understanding the virtues of them all, no one understands them better than the amazing founders of these license models, most of them from IBM, of course (I wish they did that when I was there).
What happened between the Equinox and Solstice is a fascinating saga within the OpenDaylight community which I think played its course in the spirit of total and complete openness, inclusion, diversity, respect of the individual and the community, and most of all, that code rules – we do believe in running code and community consensus. I tip my hat to all my fellow colleagues that learned these two things along the way, the enormous talent at the Eclipse and Linux Foundation that helped us launch, and even the analysts who tried (and did incredibly well at times) to speculate the secret reasons why these partners came up with the model we did: there is no secret at all, my friends, we’re simply creating a community that is truly open, diverse, inclusive, and never fragmented. Just like a big, happy family. Welcome to OpenDaylight, we hope you’ll stay!
Welcome to the Cisco Sizzle! Each month, we’re rounding up the best of the best from across our social media channels for your reading pleasure. From the most read blog posts to the top engaging content on Facebook or LinkedIn, catch up on things you might have missed, or on the articles you just want to see again, all in one place.
Let’s take a look back at the top content from April…
Are you prepared for the IoE Economy?
In this blog post, Cisco’s Chief Futurist Dave Evans and Joseph Bradley of Cisco’s Internet Business Solutions Group share two use cases for IoE – connected marketing and connected healthcare – with both a near-term and futuristic lens.
John Chambers Receives Honorary Doctorate
Cisco Chairman and CEO John Chambers received an honorary doctorate from San Jose State University at the honors convocation ceremony in April. His main message to the grads? Never stop learning.
Tomorrow Starts Here
What if the next big thing, isn’t big at all? It’s lots of things, all waking up. Explore how IoE will change the way we work, live, play and learn.
Innovation May Spark Economic Renewal
If we’ve learned anything from the last two decades, it’s that every time we think the Internet has exhausted its transformative potential, something highly disruptive comes along. Cisco CTO Padmasree Warrior talks IoE innovation and the $14.4 trillion value at stake that will spur research, new investments and new jobs.
A Typical Day
Explore how the Internet of Everything is sparking innovation and instigating meaningful actions to happen faster.
Is Your Site Safe From Attack?
Ars Technica editor Dan Godin compiled a list of Apache website compromises that have been impacting thousands of legitimate sites by allowing entrance to remote attackers. Until his research, no one had realized the magnitude of the situation and how widespread the attacks were. Check out the full insights, including potential solutions, in this blog post.
Three Networking Truths
There’s a clear consensus that one size does not fit all when it comes to deploying Software Defined Networking (SDN) solutions to different organizations. Time to dispel common networking misconceptions with three truths about the future of networking as Cisco sees it.
Last Friday (April 26), ESET and Sucuri simultaneously blogged about the discovery of Linux/CDorked, a backdoor impacting Apache servers running cPanel. Since that announcement, there has been some confusion surrounding the exact nature of these attacks. Rather than reinvent the analysis that has already been done, this blog post is intended to clear up some of the confusion.
When did Linux/CDorked first appear?
According to Cisco TRAC analysis, the first encounter was on March 4, 2013.
How is Linux/CDorked related to DarkLeech?
The appearance of Linux/CDorked coincided with a drop in the number of DarkLeech infections, an indication the attacker(s) may be one and the same.
Unlike DarkLeech, the Linux/CDorked infections appear to be only targeting Apache servers with cPanel installed. Conversely, DarkLeech was found on servers running a variety of control panels (or not). Read More »