Big Data remains one of the hottest topics in the industry due to the actual dollar value that businesses are deriving from making sense from tons of structured and unstructured data. Virtually every field is leveraging a data-driven strategy as people, process, data and things are increasing being connected (Internet of Everything). New tools and techniques are being developed that can mine vast stores of data to inform decision making in ways that were previously unimagined. The fact that we can derive more knowledge by joining related information and recognizing correlations can inform and enrich numerous aspects of every day life. There’s a good reason why Big Data is so hot!
This year at Hadoop Summit, Cisco invites you to learn how to unlock the value of Big Data. Unprecedented data creation opens the door to responsive applications and emerging analytics techniques and businesses need a better way to analyze data. Cisco will be showcasing Infrastructure Innovations from both Cisco Unified Computing System (UCS) and Cisco Applications Centric Infrastructure (ACI). Cisco’s solution for deploying big data applications can help customers make informed decisions, act quickly, and achieve better business outcomes.
Cisco is partnering with leading software providers to offer a comprehensive infrastructure and management solution, based on Cisco UCS, to support our customers’ big data initiatives. Taking advantage of Cisco UCS’s Fabric based infrastructure, Cisco can apply significant advantage to big data workloads.
I’ve been in this industry for more than three decades, and so I’ve experienced every data center technology breakthrough and market transformation in that time. We drove a market disruption ourselves with the introduction of Cisco Unified Computing System (UCS) in 2009, and after just five years, we have more than 33,000 customer-proven results.
Now, we’re doing it again, but this time it’s different.
We are in the midst of the next major inflection point, driven by a new wave of applications. With the swipe of a finger, users can download an endless array of useful apps to their smart phones, tablets, and even wearable gadgets. We bring our personal devices with us to work, expecting the IT department to deliver the same access and ease of use on the business side.
This consumerization of IT puts end users in the driver’s seat. Scrambling to meet growing consumer and employee expectations, organizations in both the public and private sectors have demands of their own when it comes to next-generation data center capabilities and improved outcomes. Applications need holistic compute solutions, not just plain old servers. The explosive growth of mobility, social media, collaboration, the Internet of Everything (IoE), and big data means their applications need to scale up and out.
Now applications must be serviced by compute solutions that can integrate performance needs, handle large data sets, and scale as needed while reducing operational complexity and OpEx budgets. The requirements of these complex business applications are defining the infrastructure—not the other way around—because now more than ever, application performance translates into business results. This requires fresh innovation in designing an integrated infrastructure that is highly responsive to business and IT needs, while keeping data center budgets from spinning out of control.
At Cisco Live, I’ll show you how we’re driving a market disruption once again, this time with our breakthroughs in compute solutions that we didn’t think were possible just a few years ago. Technology leaders agree that Cisco UCS and Cisco Application Centric Infrastructure (ACI) deliver solutions that put IT managers back in the driver’s seat, able to meet user demands, where applications are no longer constrained by the data center infrastructure.
It has been a great year for Data Virtualization at Cisco Live! Milan, Melbourne, and Toronto were fantastic opportunities to introduce Data Virtualization to Cisco customer and partner audiences. And we have saved the best for last with multiple activities at Cisco Live! San Francisco.
We kick things off on Monday May 18 with a by-invitation program for Cisco Data Virtualization customers and prospects. We start the day at 3:00 with a special pass to John Chambers’ keynote address. This is followed by a reception, data virtualization demo and tour in the World of Solutions hall. And we close the evening with a dinner at one of San Francisco’s finest restaurants. Participants in this program return on Wednesday night for a special performance by Lenny Kravitz. If you would like to join us, please contact Paul Torrento at email@example.com.
For those of you attending the full event, Data Virtualization is also featured in two sessions both entitled, Driving Business Outcomes for Big Data Environment. I will lead a quick summary session on Thursday at 11:15am, with Jim Green providing a deeper-dive technical session from 11:30-12:30 that day. In these sessions we will address one of the major issues organizations are facing as a consequence of exponential data growth – that is the huge expenses required to upgrade capacity in their enterprise data warehouses. To avoid this spend, customers are looking for lower cost alternatives such as offloading infrequently used data to Hadoop. In these sessions you will find out about Cisco’s complete solution with Unified Computing System hardware and Data Virtualization software and Services methodology.
Please also stop by the Data Virtualization booth in the Cisco Services pavilion where we can chat about your business outcome objectives and how data virtualization can help.
In a constantly changing world, getting the right talent focused on the most pressing challenges is essential — not just for companies, but for service providers, cities, and countries.
Today, the key driver of that rapid change is technology, particularly the explosion in connectivity known as the Internet of Everything (IoE). Cisco predicts that IoE will have connected 50 billion “things” by 2020, compared to 10 billion today. But for all the talk of things, IoE is not just about embedding sensors in shoes, jet engines, refrigerators, and shopping carts. The true opportunity arises when people, process, data, and things are connected in startling new ways.
In such an environment, collaboration is critical. Indeed, IoE-related innovations have the potential to improve and transform our world in profound ways. But no one company can solve these challenges. They will require partnerships and the open sharing of ideas and talent.
Technology companies, in particular, will need to change the ways in which they utilize their talent. For many decades, there was one way to access talent — by hiring it. Today, workforces are flexible and may be spread across time zones and continents. Knowledge workers still contribute as employees on company payrolls, of course. But increasingly, they are just as likely to collaborate on a specific project as partners or as subject-matter experts sharing knowledge within cross-functional or cross-industry groups.
That is why I feel so strongly about a recent out-of-court settlement in Silicon Valley regarding the free flow of talent from one organization to another. Apple, Google, Intel, and Adobe agreed to pay more than $300 million to 64,000 engineers who claimed that the companies’ hiring policies were hindering their career paths and access to higher salaries.
Undoubtedly Big Data is becoming an integral part of enterprise IT ecosystem across major industry verticals, and Apache Hadoop is emerging almost synonymous with it as the as the foundation of the next generation data management platform. Sometimes referred to as Data Lake this platform serves as the primary landing zone for data from across a wide variety of data sources. Traditional and several new application software vendors have been building the plumbing -- in software terms data connectors and data movers -- to extract data from it for further processing. New to Apache Hadoop is YARN which is pretty much an operating system for Big Data enabling multiple workloads -- batch, interactive, streaming, and real-time -- all coexisting on a cluster.
The Hortonworks Data Platform combines the most useful and stable versions of Apache Hadoop and its related projects into a single tested and certified package. Cisco has been partnering with HortonWorks to provide an industry leading platform for enterprise Hadoop deployments. The Cisco UCS solution for Hortonworks Data Platform is based on the Cisco UCS Common Platform Architecture Version 2 for Big Data – a popular platform for Data Lakes widely adopted across major industry verticals, featuring single connect, unified management, advanced monitoring capabilities, seamless management integration and data integration (plumbing) capabilities with other enterprise application systems based on Oracle, Microsoft, SAS, SAP and others.
We are excited to see several joint wins with Hortonworks in the service provider, insurance, retail, healthcare and other sectors. The joint solution is available in three reference architectures, Performance-Capacity Balanced, Capacity Optimized and Capacity Optimized with Flash – all support up to 10 racks at 16 servers each without additional switches. Scaling beyond 10 racks (160 servers) can be implemented by interconnecting domains using Cisco Nexus 6000/7000/9000 series switches, scalable to thousands of servers and to hundreds of petabytes storage, and managed from a single pane using the Cisco UCS Central.
New to this partnership is Hortonworks Data Platform 2.1 which includes Apache Hive 13 which significantly faster than previous generation Hive 12. We have jointly conducted extensive performance benchmarking using 20 queries derived from TPC-DS Benchmark – an industry standard benchmark for Decision Support Systems from the Transaction Processing Performance Council (TPC). The tests were conducted on a 16 node Cisco UCS CPA v2 Performance-Capacity Balanced cluster using a 30TB dataset. We have observed about 300% performance acceleration for some queries with Hive 13 compared to Hive 12. See Figure 1.
Additional performance are improvements expected with the GA release. What does this mean? (i) First of all, Hive brings SQL like abilities – SQL being the most common and expressive language for analytics -- to petabyte scale datasets – in an economical manner (ii) Hadoop becomes friendlier for SQL developers and SQL based business analytics platforms (iii) Such performance improvements (from Hive 12 to 13) makes migrations from proprietary systems to Hadoop even more compelling. More coming. Stay tuned !
Figure 1:Hive 13 vs. Hive 12
Disclaimer: The queries listed here is derived from the TPC-DS Benchmark. These results cannot be compared with TPC-DS Benchmark results. For more information visit www.tpc.org.