By now it is clear that big data analytics opens the door to unprecedented analytic opportunities for business innovation, customer retention and profit growth. However, a shortage of data scientists is creating a bottleneck as organizations move from early big data experiments into larger scale adoption. This constraint limits big data analytics and the positive business outcomes that could be achieved.
Click on the photo to hear from Comcast’s Jason Hull, Data Integration Specialist about how his team uses data virtualization to get what they need done, faster
It’s All About the Data
As every data scientist will tell you, the key to analytics is data. The more data the better, including big data as well as the myriad other data sources both in the enterprise and across the cloud. But accessing and massaging this data, in advance of data modeling and statistical analysis, typically consumes 50% or more of any new analytic development effort.
• What would happen if we could simplify the data aspect of the work?
• Would that free up data scientists to spend more time on analysis?
• Would it open the door for non-data scientists to contribute to analytic projects?
SQL is the key. Because of its ease and power, it has been the predominant method for accessing and massaging data for the past 30 years. Nearly all non-data scientists in IT can use SQL to access and massage data, but very few know MapReduce, the traditional language used to access data from Hadoop sources.
How Data Virtualization Helps
“We have a multitude of users…from BI to operational reporting, they are constantly coming to us requesting access to one server or another…we now have that one central place to say ‘you already have access to it’ and they immediately have access rather than having to grant access outside of the tool” -Jason Hull, Comcast
Data virtualization offerings, like Cisco’s, can help organizations bridge this gap and accelerate their big data analytics efforts. Cisco was the first data virtualization vendor to support Hadoop integration with its June 2011 release. This standardized SQL approach augments specialized MapReduce coding of Hadoop queries. By simplifying access to Hadoop data, organizations could for the first time use SQL to include big data sources, as well as enterprise, cloud and other data sources, in their analytics.
In February 2012, Cisco became the first data virtualization vendor to enable MapReduce programs to easily query virtualized data sources, on-demand with high performance. This allowed enterprises to extend MapReduce analyses beyond Hadoop stores to include diverse enterprise data previously integrated by the Cisco Information Server.
In 2013, Cisco maintained its big data integration leadership with updates of its support for Hive access to the leading Hadoop distributions including Apache Hadoop, Cloudera Distribution (CDH) and Hortonworks (HDP). In addition, Cisco now also supports access to Hadoop through HiveServer2 and Cloudera CDH through Impala.
Others, beyond Cisco, recognize this beneficial trend. In fact, Rick van der Lans, noted Data Virtualization expert and author, recently blogged on future developments in this area in Convergence of Data Virtualization and SQL-on-Hadoop Engines.
So if your organization’s big data efforts are slowed by a shortage of data scientists, consider data virtualization as a way to break the bottleneck.
Tags: apache, Big Data, Cisco Data Center, Cisco Data virtualization, Cloudera, Composite Software, data integration, data virtualization, Hadoop, HiveServer2, Hortonworks, mapreduce, query, SQL, video
This week we’re announcing new systems at the upper end of the UCS server product line: some heavy-duty iron for heavy-duty times. These are important new tools for our UCS customers: the digital age is accelerating, IT needs more horsepower to keep up, and there is a lot at stake.
Consider this: less than 10 years ago, some of the largest mainframes scaled up to half a terabyte (TB) of main memory. What if I were to tell you that these latest generation UCS blade servers will scale to 3TB? Sound like a lot? It is. And that’s just the two-processor version. Connect two UCS B260 M4 blades with an expansion connector and they become a UCS B460 M4, a four socket server that will scale to 6TB. Putting that into perspective: a spiffy new laptop might ship today with 8GB of memory. Multiply that by 750 and you have 6TB.
Not too long ago, all the content Wikipedia would fit in this type of footprint (in 2010 it was just under 6TB with media.) Here is a fun illustration of what this scale of data would look like on paper (just the ~10GB of text, not the images.) Now remember, we’re not talking about fitting all that data on the local disks of the server – we’re talking about fitting it in main memory. This is becoming crucially important in the field of data analytics, where “in-memory” is the key to speed and competitiveness. Applications like SAP HANA are at the forefront of this trend. Today, at Intel’s launch event in San Francisco, Dan Morales (Vice President of Enabling Functions at eBay) joined us to talk about how they’re betting on this type of analytic technology to help them make the eBay Marketplace work better for buyers and sellers (and eBay shareholders.) I’ll post a video clip of that soon; his description of the challenges and opportunities, at eBay scale, is worth a watch.
We’ve talked about memory scaling, and Bruno Messina has a nice post that talks more about the scalability on these systems and UCS at large. But dominating performance is the name of the game: behemoth processing performance is what we look for at this end of the server spectrum and Intel has not disappointed on this round of new technology. The next generation of the Intel Xeon E7 family packs up to 15 cores per processor and delivers an average 2x performance increase compared to previous generation products. Performance will be even higher on specific workloads, for example up to 3X on database and even higher for virtualization. Cisco’s implementation of this technology has once again set the standard for system performance. In today’s launch, Intel cited Cisco with 6 industry-leading results on key workloads. As of this posting, the closest to come to that achievement that was Dell with 4. HP ProLiant posted 1. So hats off, once again, to the engineering team in Cisco’s Computing Systems Product Group. Girish Kulkarni has a great summary of the performance news here.
Our collaboration with Intel is one of the best technology combinations in the industry today. Consider what we both bring to the party. Intel: innovation in processor technology that drives Moore’s Law. Cisco: innovation in connecting things across the data center and around the world. UCS is an outcome of two blue-chip tech powerhouses investing in real innovation and the results have changed the industry.
In 1991, Stewart Alsop famously wrote: “I predict that the last mainframe will be unplugged on 15 March 1996.” He just as famously had to eat his words. He munched on those twelve years ago, and while mainframes and RISC-based systems remain, there is an inexorable trend as the heaviest analytic workloads continue to shift to the type of scale-up x86-based systems we’re talking about today. It only makes sense. So while this will garner me plenty of comments from the architectural purists out there, I say “go ahead and plug a mainframe back in.” It will fit right in your UCS B-Series blade chassis…
Tags: Big Data, Blade Servers, Cisco Data Center, Cisco Data Center strategy, Cisco Servers, Cisco UCS, Cisco Unified Computing System, SAP. HANA, unified computing
Cisco Live Europe 2014 (Milan) just ended and I think I will need a few days to reorganize my ideas. It was my first opportunity to attend #CLEUR, and moreover I did it as a CCIE.
CLEUR is not just a Cisco event (or should I say THE Cisco event) but also a unique opportunity to experience technology, expertise, vision, and inspiration. And of course meet friends. Read More »
Tags: #ciscochampion, #CLEUR, Cisco Data Center, Cisco onePK, Cisco vWAAS, ciscolive, Fog computing, IoE, IPv6, Puppet, VIRL
Last week I was in London for the Gartner Data Center Conference. As always there was a wide range of interesting topics being discussed, all very useful. Working in Cisco Data Center Services, I am interested in many data center topics, however this year I was interested to hear perspectives on SDN, how the market is evolving, and how the attendees – including many senior IT practitioners – are considering SDN adoption.
London’s Big Ben at Night
From a Cisco perspective, we were showcasing the recently launched Application Centric Infrastructure (ACI), which generated a lot of interest. There is growing awareness among our customers that ACI could do for networks and applications what the Cisco UCS has done for the server market (with UCS server profiles in the latter proving a good analogy to help customers understand the potential of ACI).
So what were some of my key takeaways from the SDN discussion I heard here? And what were the questions that in my view are still not being discussed sufficiently across the industry?
Read More »
Tags: ACI, application centric infrastructure, Cisco Data Center, Cisco SDN Controller, Cisco Services, SDN, software defined networking
Nobody thought the ‘plumbers’ could succeed in compute …
The numbers are in – across the board Cisco is posting strong results and tracking unprecedented momentum in the server market. With Cisco’s Q3 financial earnings announcement reporting 77% Y/Y growth in Data Center and now the latest IDC Server Tracker results [view UCS Advantage], Cisco is proving to be a formidable force in the compute space. In less than four years after entering a market with very well-established competitors, Cisco has captured the #2 worldwide share position in x86 blade servers*.
The industry has seen businesses shift over 19% of the global x86 blade market to Cisco UCS, and over 28% in the US. In the recent earnings announcement, Cisco reported more than 23,000 unique UCS customers worldwide, representing a customer growth number of 89% Y/Y.
This is not luck …
This is about the value that Cisco is providing our customers. Although we develop products using the same industry standard hardware & software as our competitors, Cisco continues to grow market share. This is attributed Cisco’s unique & innovative approach to providing an open, standards-based data center network architecture and ecosystem that maintains customer choice. We are increasing business value while substantially decreasing the total cost of ownership (TCO). With Cisco Unified Computing System, we are truly evolving the way customers approach the data center, focused on consolidating resources, accelerating server deployment, and simplifying management – flexible and scalable for any workload. It’s that simple.
You hear a lot of buzz words around the industry. But when it comes down to the numbers, Cisco is driving real results for real customers [click to enlarge]:
Here is just some of what we are hearing from our customers:
Read More »
Tags: blade server, blade server TCO, Cisco, Cisco Data Center, Cisco Data Center Fabric, data center, data center architecture, fabric, Frank Palumbo, market share, server, SVP Global Data Center Sales, Tomorrow Starts Here, UCS, unified computing, unified computing system, Unified Data Center, Unified Fabric, unified management, virtualization, x86 blade servers