Cisco Blogs


Cisco Blog > Data Center and Cloud

How Data Virtualization Helps Data Scientists

By now it is clear that big data analytics opens the door to unprecedented analytic opportunities for business innovation, customer retention and profit growth. However, a shortage of data scientists is creating a bottleneck as organizations move from early big data experiments into larger scale adoption. This constraint limits big data analytics and the positive business outcomes that could be achieved.

Jason Hull

Click on the photo to hear from Comcast’s Jason Hull, Data Integration Specialist about how his team uses data virtualization to get what they need done, faster

It’s All About the Data

As every data scientist will tell you, the key to analytics is data. The more data the better, including big data as well as the myriad other data sources both in the enterprise and across the cloud. But accessing and massaging this data, in advance of data modeling and statistical analysis, typically consumes 50% or more of any new analytic development effort.

• What would happen if we could simplify the data aspect of the work?
• Would that free up data scientists to spend more time on analysis?
• Would it open the door for non-data scientists to contribute to analytic projects?

SQL is the key. Because of its ease and power, it has been the predominant method for accessing and massaging data for the past 30 years. Nearly all non-data scientists in IT can use SQL to access and massage data, but very few know MapReduce, the traditional language used to access data from Hadoop sources.

How Data Virtualization Helps

“We have a multitude of users…from BI to operational reporting, they are constantly coming to us requesting access to one server or another…we now have that one central place to say ‘you already have access to it’ and they immediately have access rather than having to grant access outside of the tool” -Jason Hull, Comcast

Data virtualization offerings, like Cisco’s, can help organizations bridge this gap and accelerate their big data analytics efforts. Cisco was the first data virtualization vendor to support Hadoop integration with its June 2011 release. This standardized SQL approach augments specialized MapReduce coding of Hadoop queries. By simplifying access to Hadoop data, organizations could for the first time use SQL to include big data sources, as well as enterprise, cloud and other data sources, in their analytics.

In February 2012, Cisco became the first data virtualization vendor to enable MapReduce programs to easily query virtualized data sources, on-demand with high performance. This allowed enterprises to extend MapReduce analyses beyond Hadoop stores to include diverse enterprise data previously integrated by the Cisco Information Server.

In 2013, Cisco maintained its big data integration leadership with updates of its support for Hive access to the leading Hadoop distributions including Apache Hadoop, Cloudera Distribution (CDH) and Hortonworks (HDP). In addition, Cisco now also supports access to Hadoop through HiveServer2 and Cloudera CDH through Impala.

Others, beyond Cisco, recognize this beneficial trend. In fact, Rick van der Lans, noted Data Virtualization expert and author, recently blogged on future developments in this area in Convergence of Data Virtualization and SQL-on-Hadoop Engines.

So if your organization’s big data efforts are slowed by a shortage of data scientists, consider data virtualization as a way to break the bottleneck.

Tags: , , , , , , , , , , , , , ,

Cisco, Cloudera and The Future of Data Management – on Display at Cloudera Sessions

Cisco Big Data at Cloudera Sessions

 

The Cloudera Sessions Roadshow helps companies to navigate the Big Data journey.  As Hadoop takes the data management market by storm, organizations are evolving the role it plays in the modern data center. This disruptive technology is quickly transforming an industry, the value it adds to the modern data center, and how you can leverage it today.  When combined with Cisco Unified Computing System™ (Cisco UCS®), the joint solution helps you exploit the valuable insights contained in your data to drive meaningful change in your business.

The Cloudera Sessions roadshow is designed to help organizations to identify where they are on their Big Data journey and to navigate how to stay the course in a low-risk, productive way. The Cloudera Sessions’ attendees will benefit from hearing about Cloudera and its partners’ experiences with real-world deployments, as well as those of Hadoop users who plan and manage them.

Cisco is partnering with Cloudera to offer a comprehensive infrastructure and management solution, based on the Cisco Unified Computing System (UCS), to support our customers big data initiatives.  As a proud sponsor for this event, I would encourage you to join us at one of the following scheduled stops to learn more about our joint solutions for big data:

Screen Shot 2014-04-18 at 1.01.11 PM

 

 

Cloudera Sessions -- Los Angeles on April 24, 2014

 

Cloudera Sessions -- Denver on May 22, 2014

 

 

 

 

San Francisco on June 4, 2014 (Registration Link Available Soon)
New York on June 18, 2014 (Registration Link Available Soon)
--
More Cities to be added
--
By the way, Cisco will be showcasing our Unified Data Center portfolio at EMC World in Las Vegas from May 5th to May 8th and at Cisco Live San Francisco from May 18th to May 22nd.  Stop by and say hello and let me know if you have any comments or questions, or via twitter at @CicconeScott.

 

 

 

Tags: , , , , , , , ,

Bring Your Own Service: Why It Needs to be on InfoSec’s Radar

Security concerns around cloud adoption can keep many IT and business leaders up at night. This blog series examines how organizations can take control of their cloud strategies. The first blog of this series discussing the role of data security in the cloud can be found here. The second blog of this series highlighting drivers for managed security and what to look for in a cloud provider can be found here.

In today’s workplace, employees are encouraged to find the most agile ways to accomplish business: this extends beyond using their own devices to work on from anywhere, anytime and at any place to now choosing which cloud services to use.

Why Bring Your Own Service Needs to be on Infosec's Radar

Why Bring Your Own Service Needs to be on Infosec’s Radar

In many instances, most of this happens with little IT engagement. In fact, according to a 2013 Fortinet Survey, Generation Y users are increasingly willing to skirt such policies to use their own devices and cloud services. Couple this user behavior with estimates from Cisco’s Global Cloud Index that by the year 2017, over two thirds of all data center traffic will be based in the cloud proves that cloud computing is undeniable and unstoppable.

With this information in mind, how should IT and InfoSec teams manage their company’s data when hundreds of instances of new cloud deployments happen each month without their knowledge?

Additionally, what provisions need to be in place to limit risks from data being stored, processed and managed by third parties?

Here are a few considerations for IT and InfoSec teams as they try to secure our world of many clouds:

Read More »

Tags: , , , , , , , , , , , , , ,

Maximizing Big Data Performance and Scalability with MapR and Cisco UCS

Huge amounts of information are flooding companies every second, which has led to an increased focus on big data and the ability to capture and analyze this sea of information. Enterprises are turning to big data and Apache Hadoop in order to improve business performance and provide a competitive advantage. But to unlock business value from data quickly, easily and cost-effectively, organizations need to find and deploy a truly reliable Hadoop infrastructure that can perform, scale, and be used safely for mission-critical applications.

As more and more Hadoop projects are being deployed to provide actionable results in real-time or near real-time, low latency has become a key factor that influences a company’s Hadoop distribution choice. Thus, performance and scalability should be evaluated closely before choosing a particular Hadoop solution.

Performance

The raw performance of a Hadoop platform is critical; it refers to how quickly the platform can ingest, process and analyze information. The MapR Distribution for Hadoop in particular provides world-record performance for MapReduce operations on Hadoop. Its advanced architecture harnesses distributed metadata with an optimized shuffle process, delivering consistent high performance.

The graph below compares the MapR M7 Edition with another Hadoop distribution, and it vividly illustrates the vast difference in latency and performance between these Hadoop distributions.

High Performance with Low Latency

 

One particular solution that is optimized for performance is Cisco UCS with MapR. MapR on the Cisco Unified Computing System™ (Cisco UCS®) is a powerful, production-ready Hadoop solution that increases business and IT agility, supports mission-critical workloads, reduces total cost of ownership (TCO), and delivers exceptional return on investment (ROI) at scale.

Read More »

Tags: , , , , , , , , , , , ,

Big Data Ecosystem Challenges

Information security is one of the largest business problems facing organisations. Log data generated from networks and computer systems can be aggregated, stored, and analysed to identify where misuse occurs. The enormous amount of data involved in these analyses is beyond the capability of traditional systems and requires a new, big data approach. Given the right tools, skills and people, security teams can take advantage of big data analysis to quickly identify malicious activity and remediate attacks. Together, the big data platforms, the administration tools, analysis tools, skilled analysts, and pressing problems form an evolving ecosystem driving innovation. It would be a mistake to believe that this ecosystem is not without its challenges.
Read More »

Tags: , , ,