By now it is clear that big data analytics opens the door to unprecedented analytic opportunities for business innovation, customer retention and profit growth. However, a shortage of data scientists is creating a bottleneck as organizations move from early big data experiments into larger scale adoption. This constraint limits big data analytics and the positive business outcomes that could be achieved.
Click on the photo to hear from Comcast’s Jason Hull, Data Integration Specialist about how his team uses data virtualization to get what they need done, faster
It’s All About the Data
As every data scientist will tell you, the key to analytics is data. The more data the better, including big data as well as the myriad other data sources both in the enterprise and across the cloud. But accessing and massaging this data, in advance of data modeling and statistical analysis, typically consumes 50% or more of any new analytic development effort.
• What would happen if we could simplify the data aspect of the work?
• Would that free up data scientists to spend more time on analysis?
• Would it open the door for non-data scientists to contribute to analytic projects?
SQL is the key. Because of its ease and power, it has been the predominant method for accessing and massaging data for the past 30 years. Nearly all non-data scientists in IT can use SQL to access and massage data, but very few know MapReduce, the traditional language used to access data from Hadoop sources.
How Data Virtualization Helps
“We have a multitude of users…from BI to operational reporting, they are constantly coming to us requesting access to one server or another…we now have that one central place to say ‘you already have access to it’ and they immediately have access rather than having to grant access outside of the tool” -Jason Hull, Comcast
Data virtualization offerings, like Cisco’s, can help organizations bridge this gap and accelerate their big data analytics efforts. Cisco was the first data virtualization vendor to support Hadoop integration with its June 2011 release. This standardized SQL approach augments specialized MapReduce coding of Hadoop queries. By simplifying access to Hadoop data, organizations could for the first time use SQL to include big data sources, as well as enterprise, cloud and other data sources, in their analytics.
In February 2012, Cisco became the first data virtualization vendor to enable MapReduce programs to easily query virtualized data sources, on-demand with high performance. This allowed enterprises to extend MapReduce analyses beyond Hadoop stores to include diverse enterprise data previously integrated by the Cisco Information Server.
In 2013, Cisco maintained its big data integration leadership with updates of its support for Hive access to the leading Hadoop distributions including Apache Hadoop, Cloudera Distribution (CDH) and Hortonworks (HDP). In addition, Cisco now also supports access to Hadoop through HiveServer2 and Cloudera CDH through Impala.
Others, beyond Cisco, recognize this beneficial trend. In fact, Rick van der Lans, noted Data Virtualization expert and author, recently blogged on future developments in this area in Convergence of Data Virtualization and SQL-on-Hadoop Engines.
So if your organization’s big data efforts are slowed by a shortage of data scientists, consider data virtualization as a way to break the bottleneck.
Tags: apache, Big Data, Cisco Data Center, Cisco Data virtualization, Cloudera, Composite Software, data integration, data virtualization, Hadoop, HiveServer2, Hortonworks, mapreduce, query, SQL, video
The Cloudera Sessions Roadshow helps companies to navigate the Big Data journey. As Hadoop takes the data management market by storm, organizations are evolving the role it plays in the modern data center. This disruptive technology is quickly transforming an industry, the value it adds to the modern data center, and how you can leverage it today. When combined with Cisco Unified Computing System™ (Cisco UCS®), the joint solution helps you exploit the valuable insights contained in your data to drive meaningful change in your business.
The Cloudera Sessions roadshow is designed to help organizations to identify where they are on their Big Data journey and to navigate how to stay the course in a low-risk, productive way. The Cloudera Sessions’ attendees will benefit from hearing about Cloudera and its partners’ experiences with real-world deployments, as well as those of Hadoop users who plan and manage them.
Cisco is partnering with Cloudera to offer a comprehensive infrastructure and management solution, based on the Cisco Unified Computing System (UCS), to support our customers big data initiatives. As a proud sponsor for this event, I would encourage you to join us at one of the following scheduled stops to learn more about our joint solutions for big data:
San Francisco on June 4, 2014 (Registration Link Available Soon)
New York on June 18, 2014 (Registration Link Available Soon)
More Cities to be added
Tags: Big Data, Blade Servers, Cisco UCS, Cisco Unified Computing System, Cloudera, Cloudera Sessions, Hadoop, Rack Servers, UCS
Security concerns around cloud adoption can keep many IT and business leaders up at night. This blog series examines how organizations can take control of their cloud strategies. The first blog of this series discussing the role of data security in the cloud can be found here. The second blog of this series highlighting drivers for managed security and what to look for in a cloud provider can be found here.
In today’s workplace, employees are encouraged to find the most agile ways to accomplish business: this extends beyond using their own devices to work on from anywhere, anytime and at any place to now choosing which cloud services to use.
Why Bring Your Own Service Needs to be on Infosec’s Radar
In many instances, most of this happens with little IT engagement. In fact, according to a 2013 Fortinet Survey, Generation Y users are increasingly willing to skirt such policies to use their own devices and cloud services. Couple this user behavior with estimates from Cisco’s Global Cloud Index that by the year 2017, over two thirds of all data center traffic will be based in the cloud proves that cloud computing is undeniable and unstoppable.
With this information in mind, how should IT and InfoSec teams manage their company’s data when hundreds of instances of new cloud deployments happen each month without their knowledge?
Additionally, what provisions need to be in place to limit risks from data being stored, processed and managed by third parties?
Here are a few considerations for IT and InfoSec teams as they try to secure our world of many clouds:
Read More »
Tags: 2014 annual security report, CIO, Cisco Security, CiscoCloud, cloud, cloud security, data security, Fortinet, Hadoop, infosec, ITaaS, OLAP, security, Service Provider, wired
Huge amounts of information are flooding companies every second, which has led to an increased focus on big data and the ability to capture and analyze this sea of information. Enterprises are turning to big data and Apache Hadoop in order to improve business performance and provide a competitive advantage. But to unlock business value from data quickly, easily and cost-effectively, organizations need to find and deploy a truly reliable Hadoop infrastructure that can perform, scale, and be used safely for mission-critical applications.
As more and more Hadoop projects are being deployed to provide actionable results in real-time or near real-time, low latency has become a key factor that influences a company’s Hadoop distribution choice. Thus, performance and scalability should be evaluated closely before choosing a particular Hadoop solution.
The raw performance of a Hadoop platform is critical; it refers to how quickly the platform can ingest, process and analyze information. The MapR Distribution for Hadoop in particular provides world-record performance for MapReduce operations on Hadoop. Its advanced architecture harnesses distributed metadata with an optimized shuffle process, delivering consistent high performance.
The graph below compares the MapR M7 Edition with another Hadoop distribution, and it vividly illustrates the vast difference in latency and performance between these Hadoop distributions.
One particular solution that is optimized for performance is Cisco UCS with MapR. MapR on the Cisco Unified Computing System™ (Cisco UCS®) is a powerful, production-ready Hadoop solution that increases business and IT agility, supports mission-critical workloads, reduces total cost of ownership (TCO), and delivers exceptional return on investment (ROI) at scale.
Read More »
Tags: Big Data, blade server, Blade Servers, Cisco UCS, Cisco UCS C240 M3 Rack Server, Cisco Unified Computing System, Cisco Unified Data Center, Cisco Unified Fabric, Hadoop, MapR, rack server, UCS Central, UCS service profiles
Information security is one of the largest business problems facing organisations. Log data generated from networks and computer systems can be aggregated, stored, and analysed to identify where misuse occurs. The enormous amount of data involved in these analyses is beyond the capability of traditional systems and requires a new, big data approach. Given the right tools, skills and people, security teams can take advantage of big data analysis to quickly identify malicious activity and remediate attacks. Together, the big data platforms, the administration tools, analysis tools, skilled analysts, and pressing problems form an evolving ecosystem driving innovation. It would be a mistake to believe that this ecosystem is not without its challenges.
Read More »
Tags: Big Data, Hadoop, security, TRAC