At the June Hadoop Summit in San Jose, Hadoop was re-affirmed as the data center “killer app,” riding an avalanche of Enterprise Data, which is growing 50x annually through 2020. According to IDC, the Big Data market itself growing six times faster than the rest of IT. Every major tech company, old and new, is now driving Hadoop innovation, including Google, Yahoo, Facebook Microsoft, IBM, Intel and EMC – building value added solutions on open source contributions by Hortonworks, Cloudera and MAPR. Cisco’s surprisingly broad portfolio will be showcased at Strataconf in New York on Oct. 15 and at our October 21st executive webcast. In this third of a blog series, we preview the power of Application Centric Infrastructure for the emerging Hadoop eco-system.
Why Big Data?
Organizations of all sizes are gaining insight and creativity into use cases that leverage their own business data.
The use cases grow quickly as businesses realize their “ability to integrate all of the different sources of data and shape it in a way that allows business leaders to make informed decisions.” Hadoop enables customers to gain insight from both structure and unstructured data. Data Types and sources can include 1) Business Applications -- OLTP, ERP, CRM systems, 2) Documents and emails 3) Web logs, 4) Social networks, 5) Machine/sensor generated, 6) Geo location data.
IT operational challenges
Even modest-sized jobs require clusters of 100 server nodes or more for seasonal business needs. While, Hadoop is designed for scale out of commodity hardware, most IT organizations face the challenge of extreme demand variations in bare-metal workloads (non-virtualizable). Furthermore, they are requested by multiple Lines of Business (LOB), with increasing urgency and frequency. Ultimately, 80% of the costs of managing Big Data workloads will be OpEx. How do IT organizations quickly, finish jobs and re-deploy resources?How do they improve utilization?How do they maintain security and isolation of data in a shared production infrastructure?
A mixture of different workload types on the same infrastructure
A variety of analytics processes
In Hadoop 1.x, compute performance was paramount. But in Hadoop 2.x, network capabilities will be the focus, due to larger clusters, more data types, more processes and mixed workloads. (see Fig. 1)
ACI powers Hadoop 2.x
Cisco’s Application Centric Infrastructure is a new operational model enabling Fast IT. ACI provides a common policy-based programming approach across the entire ACI-ready infrastructure, beginning with the network and extending to all its connected end points. This drastically reduces cost and complexity for Hadoop 2.0. ACI uses Application Policy to:
- Dynamically optimize cluster performance in the network
- Redeploy resources automatically for new workloads for improved utilization
- Ensure isolation of users and data as resources are deployments change
Let’s review each of these in order:
Cluster Network Performance: It’s crucial to improve traffic latency and throughput across the network, not just within each server.
Hadoop copies and distributes data across servers to maximize reliability on commodity hardware.
The large collection of processes in Hadoop 2.0 are usually spread across different racks.
Mixed workloads in Hadoop 2.0, support interactive and real-time jobs, resulting in the use of more on-board memory and different payload sizes.
As a result, server IO bandwidth is increasing which will place loads on 10 gigabit networks. ACI policy works with deep telemetry embedded in each Nexus 9000 leaf switch to monitor and adapt to network conditions.
Using policy, ACI can dynamically 1) load-balance Big Data flows across racks on alternate paths and 2) prioritize small data flows ahead of large flows (which use the network much less frequently but use up Bandwidth and Buffer). Both of these can dramatically reducing network congestion. In lab tests, we are seeing flow completion nearly an order of magnitude faster (for some mixed workloads) than without these policies enabled. ACI can also estimate and prioritize job completion. This will be important as Big Data workloads become pervasive across the Enterprise. For a complete discussion of ACI’s performance impact, please see a detailed presentation by Samuel Kommu, chief engineer at Cisco for optimizing Big Data workloads.
Resource Utilization: In general, the bigger the cluster, the faster the completion time. But since Big Data jobs are initially infrequent, CIOs must balance responsiveness against utilization. It is simply impractical for many mid-sized companies to dedicate large clusters for the occasional surge in Big Data demand. ACI enables organizations to quickly redeploy cluster resources from Hadoop to other sporadic workloads (such as CRM, Ecommerce, ERP and Inventory) and back. For example, the same resources could run Hadoop jobs nightly or weekly when other demands are lighter. Resources can be bare-metal or virtual depending on workload needs. (see Figure 2)
How does this work? ACI uses application policy profiles to programmatically re-provision the infrastructure. IT can use a different profile to describe different application’s needs including the Hadoop eco-system. The profile contains application’s network policies, which are used by the Application Policy Infrastructure controller in to a complete network topology. The same profile contains compute and storage policies used by other tools, such as Cisco UCS Director, to provisioning compute and storage.
Data Isolation and Security: In a mature Big Data environment, Hadoop processing can occur between many data sources and clients. Data is most vulnerable during job transitions or re-deployment to other applications. Multiple corporate data bases and users need to be correctly to ensure compliance. A patch work of security software such as perimeter security is error prone, static and consumes administrative resources.
In contrast, ACI can automatically isolate the entire data path through a programmable fabric according to pre-defined policies. Access policies for data vaults can be preserved throughout the network when the data is in motion. This can be accomplished even in a shared production infrastructure across physical and virtual end points.
As organizations of all sizes discover ways to use Big Data for business insights, their infrastructure must become far more performant, adaptable and secure. Investments in fabric, compute and storage must be leveraged across, multiple Big Data processes and other business applications with agility and operational simplicity.
Leading the growth of Big Data, the Hadoop 2.x eco-system will place particular stresses on data center fabrics. New mixed workloads are already using 10 Gigabit capacity in larger clusters and will soon demand 40 Gigabit fabrics. Network traffic needs continuous optimization to improve completion times. End to end data paths must use consistent security policies between multiple data sources and clients. And the sharp surges in bare-metal workloads will demand much more agile ways to swap workloads and improve utilization.
Cisco’s Application Centric Infrastructure leverages a new operational and consumption model for Big Data resources. It dynamically translates existing policies for applications, data and clients in to fully provisioned networks, compute and storage. . Working with Nexus 9000 telemetry, ACI can continuously optimize traffic paths and enforce policies consistently as workloads change. The solution provides a seamless transition to the new demands of Big Data.
To hear about Cisco’s broader solution portfolio be sure to for register for the October 21st executive webcast ‘Unlock Your Competitive Edge with Cisco Big Data Solutions.’ And stay tuned for the next blog in the series, from Andrew Blaisdell, which showcases the ability to predictably deliver intelligence-driven insights and actions.
Today’s Cisco announcements illustrate the momentum we’ve seen thus far for Cisco’s approach to cloud. Partners across the globe are very eager to engage and understand what roles they can play to help them meet their customer’s needs for choice, flexibility, security, and compliance. As more clouds connect, opportunities for our partners to extend value and become more relevant with their customers increase.
Cisco’s approach has always been partner-centric. And with Intercloud, partners have even more opportunities to be cloud builders, providers, resellers, consultants, Independent Software Vendors (ISV) and more. The foundation for Cisco’s Intercloud is built on:
A hybrid cloud model to place and move Virtual Machine (VM) workloads on any cloud
Application-centric policy control for Chief Information Officers across all IT components and clouds at scale
Data virtualization at scale, to address the Internet of Everything (IoE) explosion, and
A network-centric partner ecosystem that provides the most choice and data sovereignty over a vendor direct model
While partner business models vary, I always emphasize that there are many opportunities and multiple roles for Intercloud partners. Intercloud partners will bring important value to the ecosystem with professional services, new offers, Service Level Agreements (SLA), consumption pricing, consolidated billing and much more. They can also enhance their roles and monetize in other ways. For example, Intercloud partners can build and operate private clouds with professional services, which opens up opportunities to create a profitable hybrid IT business model. Read More »
Big Data is not just about gathering tons of data, the digital exhaust from the internet, social media, and customer records. The real value is in being able to analyze the data to gain a desired business outcome.
Those of us who follow the Big Data market closely never lack for something new to talk about. There is always a story about how a business is using Big Data in a different way or about some new breakthrough that has been achieved in the expansive big data ecosystem. The good news for all of us is, we have clearly only scratched the surface of the Big Data opportunity!
With the increasing momentum of the Internet of Everything (IoE) market transition, there will be 50 billion devices connected to the Internet by 2020—just five years from now. As billions of new people, processes, and things become connected, each connection will become a source of potentially powerful data to businesses and the public sector. Organizations who can unlock the intelligence in this data can create new sources of competitive advantage, not just from more data but from better access to better data.
What we haven’t heard about – yet—are examples of enterprises that are applying the power of this data pervasively in their organizations: giving them a competitive edge in marketing, supply chain, manufacturing, human resources, customer support, and many more departments. The enterprise that can apply the power of Big Data throughout their organization can create multiple and simultaneous sources of ongoing innovation—each one a constantly renewable or perpetual competitive edge. Looking forward, the companies that can accomplish this will be the ones setting the pace for the competition to follow.
Cisco has been working on making this vision of pervasive use of Big Data within enterprises a reality. We’d like to share this vision with you in an upcoming blog series and executive Webcast entitled, ‘Unlock Your Competitive Edge with Cisco Big Data Solutions’, that will air on October 21st at 9:00 AM PT.
I have the honor of kicking off the multi-part blog series today. Each blog will focus on a specific Cisco solution our customers can utilize to unlock the power of their big data – enterprise-wide-- to deliver a competitive edge to our customers. I’m going to start the discussion by highlighting the infrastructure implications for Big Data in the internet of Everything (IoE) era and focus on Cisco Unified Computing System initially.
Enterprises who want to make strategic use of data throughout their organizations will need to take advantage of the power of all types of data. As IoE increasingly takes root, organizations will be able to access data from virtually anywhere in their value chain. No longer restricted to small sets of structured, historical data, they’ll have more comprehensive and even real-time data including video surveillance information, social media output, and sensor data that allow them to monitor behavior, performance, and preferences. These are just a few examples, but they underscore the fact that not all data is created equally. Real-time data coming in from a sensor may only be valuable for minutes, or even seconds – so it is critical to be able to act on that intelligence as quickly as possible. From an infrastructure standpoint, that means enterprises must be able to connect the computing resource as closely as possible to the many sources and users of data. At the same time, historical data will also continue to be critical to Big Data analytics.
Cisco encourages our customers to take a long-term view—and select a Big Data infrastructure that is distributed, and designed for high scalability, management automation, outstanding performance, low TCO, and the comprehensive, security approach needed for the IoE era. And that infrastructure must be open—because there is tremendous innovation going on in this industry, and enterprises will want to be able to take full advantage of it.
One of the foundational elements of our Big Data infrastructure is the Cisco Unified Computing System (UCS). UCS integrated infrastructure uniquely combines server, network and storage access and has recently claimed the #1, x86 blade server market share position in the Americas. It’s this same innovation that propelled us to the leading blade market share position that we are directly applying to Big Data workloads. With its highly efficient infrastructure, UCS lets enterprises manage up to 10,000 UCS servers as if they were a single pool of resources, so they can support the largest data clusters.
Because enterprises will ultimately need to be able to capture intelligence from both data at rest in the data center and data at the edge of the network, Cisco’s broad portfolio of UCS systems gives our customers the flexibility to process data where it makes the most sense. For instance, our UCS 240 rack system has been extremely popular for Hadoop-based Big Data deployments at the data center core. And Cisco’s recently introduced UCS Mini is designed to process data at the edge of the network.
Because the entire UCS portfolio utilizes the same unified architecture, enterprises can choose the right compute configuration for the workload, with the advantage of being able to use the same powerful management and orchestration tools to speed deployment, maximize availability, and significantly lower your operating expenses. Being able to leverage UCS Manager and Service Profiles, Unified Fabric and SingleConnect Technology, our Virtual interface card technology, and industry leading performance really set Cisco apart from our competition.
So, please consider this just an introduction to the first component of Cisco’s “bigger”, big data story. To hear more, please make plans to attend our upcoming webcast entitled, ‘Unlock Your Competitive Edge With Cisco Big Data Solutions’ on October 21st.
Every Tuesday and Thursday from now until October 21st, we’ll post another blog in the series to provide you with additional details of Cisco’s full line of products, solutions and services.
The last several months have been a roll with several customers, channel partner and technology partner engagements. With the ACI starter kits and lab bundles shipping, customers can bring this solution into their labs and subsequently into their production Pods with the Application Policy Infrastructure Controller (APIC) and the Nexus switching platforms. We see a healthy interest in these kits with customers as they explore its SDN capabilities. Several ecosystem partners like F5 and Citrix have started to ship device packages. We just came off a company wide sales conference at Las Vegas a couple of weeks ago that was hugely energizing. Policy as a means to drive automation, security and scale is now the major focus area for SDN as outlined originally by Cisco as more industry vendors now endorse the vision as evidenced by initiatives like OpenStack Congress. Investment protection continues to be a major Overall the new fiscal year promises to be an exciting one.
Soni Jiandani on SDN Central -- Click for Q&A
Following up on the Unleashing IT magazine (ACI special edition) released last month, I wanted to share the momentum we’re experiencing with customers and partners as the acceleration continues. As John Chambers had outlined during the last earnings call, the adoption rate has been off to a tremendous start with some of the customers and partners featured in the video above.
We also continue to take the opportunity to answer questions as the vision around ACI continues to crystallize and rapidly evolves from concept to hard reality. This week we took the opportunity to have a Q&A session with SDN central. Soni Jiandani, SVP of Insieme Networks Business Unit at Cisco led the conversation. The featured interview can be accessed here. Soni crisply articulates the ACI value proposition while addressing some of the top of mind questions that come from the media.
If you follow the news in the world of data center you probably noticed a small announcement from Cisco last week regarding the UCS portfolio… :)
To net it out in a simple way, I’ve been telling people that the trail of innovation that Cisco has been blazing with UCS just got a lot wider. That’s because this rollout is all about three key vectors that our customers have guided us to expand on:
Edge-Scale computing: taking UCS to the growing sources of computing demand beyond the core data center and to smaller scale IT organizations with UCS Mini
Padma Warrior and Joe Inzerillo discuss how technology is transforming the #MLB fan experience.
We had a stellar lineup at the event in New York. Our CTO, Padma Warrior, headlined and did a fantastic job setting the context for this wave of innovation in the frame of IoE and Fast IT. Paul Perez followed, explaining the sea change occurring in the application landscape and the customer imperatives guiding development of the UCS platform. Finally, Satinder Sethi stepped us through all the new technology we’ve added to the portfolio. Frank Palumbo hosted the event for us in New York, and I think it’s no coincidence he was rewarded later in the day by a thrilling walk-off win by the Yankees. Note that my last link there is to MLB.com, whose CTO, John Inzerillo, joined our event to share all the cool fan experience technology they’re developing.
I’d like to thank our #CiscoChampions for joining us at the event and bringing their unique and (trust me) unfiltered perspective to the news. Another highlight for me was the opportunity to tour the MLB Advanced Media Center with Matt Eastwood of IDC who joined us in New York to moderate a panel on scale-out computing. Matt, so sorry about the results of the Yankees/Red Sox game…it’s tough to overcome Palumbo-level karma. Having several of our customers and partners at the event really rounded it out, making a special day for everyone that joined us in New York and in the streaming sessions.
Jim Leach (L) and Tech Field Day panel of Cisco Champions.
To hit on all the details, the team has taken a divide-and-conquer approach here on the blog as well as youtube and our other social media venues. In addition to the links above, here are some of the pieces you can check out to learn more. Scanning the #USCGrandSlam hashtag on Twitter is another good way to take a look at the news and reactions.