At this year’s Hadoop Summit 2013
, I presented on the “The Data Center and Hadoop” which built upon the past two years of testing the effects of Hadoop on the data center infrastructure
. What makes Hadoop an important framework to study in the data center is that it contains a distributed system that combines both a distributed file system (HDFS) along with an execution framework (Map/Reduce). Further it builds upon itself and can provide other real-time or key/value stores(HBASE) along with many other possibilities. Each comes with its own set of infrastructure requirements that include throughput sensitive components along with latency sensitive components. Further in the Data Center, understanding how all these components work together is key to optimized deployments.
After studying many of these components and their effects, the very data we were alanyzing became a topic of a lot of our discussions. We combined application performance data, application logs, compute data AND network data to build a complete picture of what is happening in the data center.
With the advent of programmable networks (aka “Software Defined Networking”) it is not only important to make the network more application aware, but to also know where and how to analyze and make the right connections between the application and the network.
Tags: Big Data, Cisco Nexus, data center, Hadoop, Hadoop Summit, nexus, SDN, software defined networking
On June 20th, Cisco and MapR will join with Forrester Research Big Data analyst Mike Gualtieri to discuss “productionizing” Hadoop. But what does it mean?
Mike has developed a list of 7 architectural best practices that will help your enterprise quickly, and easily develop or move your Hadoop environment into standard data center processes. Following his guidelines, your can get your Hadoop environment up and running in no time, saving time by being proactive on the headaches and pitfalls that are unique to Big Data environments.
Joining Mike will be MapR CMO, Jack Norris discussing their best practices and how they line up with the Big 7 from Forrester.
Finally, Cisco IT will showcase a MapR production environment and how they have streamlined the complex Big Data workloads, automatically moving data into and running analytics out of their Hadoop environment.
Keeping the Hadoop production environment up and running smoothly is the name of the game here and in the face of resource constraints, Cisco IT has standardized on Cisco Tidal Enterprise Scheduler—with its seamless integrations into MapR, Hive, and Sqoop—giving your enterprise the ability to “productionize” complex workloads from any data source.
Join us as we walk you through the 7 architectural best practices for Big Data, MapR and Cisco Tidal Enterprise Scheduler.
Read More »
Tags: Big Data, cisco live, forrester, Hadoop, MapR, Tidal Enterprise Scheduler, unified management, workload automation
Guest Blog by Jack Norris
Jack is responsible for worldwide marketing for MapR Technologies, the leading provider of a enterprise grade Hadoop platform. He has over 20 years of enterprise software marketing experience and has demonstrated success from defining new markets for small companies to increasing sales of new products for large public companies. Jack’s broad experience includes launching and establishing analytic, virtualization, and storage companies and leading marketing and business development for an early-stage cloud storage software provider.
Big Data use cases are changing the competitive dynamics for organizations with a range of operational use cases. Operational intelligence refers to applications that combine real-time, dynamic, analytics that deliver insights to business operations. Operational intelligence requires high performance. “Performance” is a word that is used quite liberally and means different things to different people. Everyone wants something faster. When was the last time you said, “No, give me the slow one”?
When it comes to operations, performance is about the ability to take advantage of market opportunities as they arise. To do this requires the ability to quickly monitor what is happening. It requires both real-time data feeds and the ability to quickly react. The beauty of Apache Hadoop, and specifically MapR’s platform, is that data can be ingested as a real-time stream; analysis can be performed directly on the data, and automated responses can be executed. This is true for a range of applications across organizations, from advertising platforms, to on-line retail recommendation engines, to fraud and security detection.
When looking at harnessing Big Data, organizations need to realize that multiple applications will need to be supported. Regardless of which application you introduce first, more will quickly follow. Not all Hadoop distributions are created equal. Or more precisely, most Hadoop distributions are very similar with only minor value-added services separating them. The exception is MapR. With the best of the Hadoop community updates coupled with MapR’s innovations, the broadest set of applications can be supported including mission-critical applications that require a depth and breadth of enterprise-grade Hadoop features.
Read More »
Tags: Big Data, enterprise scheduler, Hadoop, informatica, job scheduling, MapR, Tidal Enterprise Scheduler, UCS, workload automation
The Transaction Processing Performance Council today announced its fifth international Conference on Performance Evaluation and Benchmarking (TPCTC 2013). I’ve the great privilege of chairing TPCTC series since 2009. This year’s conference will be collocated with the 39th International Conference on Very Large Data Bases (VLDB 2013) on August 26, 2013 in Riva del Garda, Italy. With this conference we are encouraging researchers and industry experts to submit ideas and methodologies in performance evaluation, measurement and characterization. Additional information on TPCTC 2013 is available online at http://www.tpc.org/tpctc/tpctc2013/.
Tags: Big Data, Big Data Benchmarks, bigdatatop100, Hadoop, TPCTC, WBDB, WBDB 2012, wbdb2012-in
Cisco and NetApp have been partners for over a decade, and in January we announced the planned expansion of our partnership. We are always looking to work with our partners in new ways to offer customers greater choice, and Cisco and NetApp are working toward delivering a complete platform for enterprises in data-intensive industries with business-critical SLAs. The solution will offer pre-sized storage, networking, and compute in a highly reliable, ready-to-deploy Hadoop stack, and it is planned to be generally available summer 2013. But, who can wait until summer?! I know we can’t, so we’re going to offer a demo of the joint reference architecture at Cisco Live! Melbourne March 5-8, and we hope you’ll stop by to check it out!
To give you more information on the solution — it will be pre-validated for enterprise Hadoop deployments built using 6296 Fabric Interconnects (connectivity and management), a pair of Nexus 2232s, C220 M3 Servers (compute) and NetApp E5400 and FAS 2240 series storage arrays. Following the -- highly successful -- FlexPod model of pre-sized rack level configurations, this solution will be made available through the well-established FlexPod sales engagement and channel. Field sales and partners from both companies will resell the solution upon general availability.
Tags: Big Data, Cisco UCS CPA, CPA, Hadoop, netapp