Avatar

Huge amounts of information are flooding companies every second, which has led to an increased focus on big data and the ability to capture and analyze this sea of information. Enterprises are turning to big data and Apache Hadoop in order to improve business performance and provide a competitive advantage. But to unlock business value from data quickly, easily and cost-effectively, organizations need to find and deploy a truly reliable Hadoop infrastructure that can perform, scale, and be used safely for mission-critical applications.

As more and more Hadoop projects are being deployed to provide actionable results in real-time or near real-time, low latency has become a key factor that influences a company’s Hadoop distribution choice. Thus, performance and scalability should be evaluated closely before choosing a particular Hadoop solution.

Performance

The raw performance of a Hadoop platform is critical; it refers to how quickly the platform can ingest, process and analyze information. The MapR Distribution for Hadoop in particular provides world-record performance for MapReduce operations on Hadoop. Its advanced architecture harnesses distributed metadata with an optimized shuffle process, delivering consistent high performance.

The graph below compares the MapR M7 Edition with another Hadoop distribution, and it vividly illustrates the vast difference in latency and performance between these Hadoop distributions.

High Performance with Low Latency

 

One particular solution that is optimized for performance is Cisco UCS with MapR. MapR on the Cisco Unified Computing System™ (Cisco UCS®) is a powerful, production-ready Hadoop solution that increases business and IT agility, supports mission-critical workloads, reduces total cost of ownership (TCO), and delivers exceptional return on investment (ROI) at scale.

The Cisco/MapR solution for Hadoop is designed, tested and validated to handle the most demanding workloads. The MapR and Cisco UCS combination brings the power of the MapR distribution to a dependable deployment model that can be quickly implemented and customized using the Cisco Unified Fabric and powerful Cisco UCS rack servers. In addition, MapR has integrated with Cisco® Tidal Enterprise Scheduler (Cisco TES), making it easy for administrators to provide automated load balancing, data exchange, and advanced event-based scheduling on MapR.

Scalability

Another attribute of any Hadoop platform is its scalability, which is the ability to expand in terms of number of nodes, node density, tables, files, etc. When looking at various Hadoop alternatives, make sure that your choice of Hadoop platform can scale easily and cost-effectively without requiring administrators to make changes to application logic.

The MapR Distribution for Hadoop is fully optimized for scalability. The Cisco UCS solution for MapR is based on the Cisco® UCS Common Platform Architecture (CPA) for Big Data (Cisco Validated Design). The Cisco UCS CPA is a highly scalable architecture designed to meet a variety of scale-out application demands with transparent data and management integration capabilities. Whether you’re deploying a large data center or buying single racks, the Cisco UCS CPA with MapR solution can be sized to deliver advanced performance, enabling Hadoop to scale as your workload increases.

Cisco UCS CPA for Big Data integrates industry-leading computing, networking, and management capabilities into a unified, fabric-based architecture optimized for big data workloads.

  • Accelerate infrastructure deployment for fast, easy growth: Cisco UCS Manager abstracts all configuration, identity, and I/O connectivity information into a Cisco UCS service profile to rapidly and consistently deploy new servers in minutes.
  • Deliver big data infrastructure in a small, efficient footprint to reduce TCO: The capability to conserve capital expenditures (CapEx), reduce operating expenses (OpEx) through efficient power use, and adopt simplified operation processes has helped customers save more than 50 percent of the cost of using traditional servers.
  • CiscoMgmtSoftwareHandle scaling required for years of rapid data growth with big data clusters supported by hundreds to thousands of servers and petabytes of storage: The solution can easily support up to 160 Cisco UCS servers in a single switching domain, and customers can add Cisco UCS Central Software to connect up to 10,000 servers.
  • Quickly load terabytes of data into the system over a high-bandwidth unified fabric: The combination of direct access to SANs, the efficiency and easy scalability of Cisco® SingleConnect technology, and the capability to host and manage big data workloads together with enterprise applications eliminates complexity and reduces costs.
  • Leading performance to quickly gain insights that enable faster innovation: Cisco UCS is designed to deliver outstanding big data performance supported by Cisco innovations down to the ASIC level. These innovations use the power of Intel Xeon processors, allowing Cisco to capture more than 90 world- record benchmarks, including the rigorous TeraSort test: a leading indicator of big data performance.

 

If you’re looking to Hadoop to help you unlock business value from your data, it’s important to consider your Hadoop distribution choice carefully. With Cisco UCS and MapR, you’ll benefit from maximum availability, high performance and scalability.

By the way, Cisco will be showcasing our Unified Data Center portfolio at Red Hat Summit in San Francisco from April 14th to April 17th and at Cisco Live San Francisco from May 18th to May 22nd.  Stop by and say hello and let me know if you have any comments or questions, or via twitter at @CicconeScott.



Authors

Scott Ciccone

Sr. Marketing Manager

Global Marketing