Why I Love Big Data Partner Series 4: Distributed Big Data Cluster with Cisco UCS and MapR–Store Locally, Query Everywhere
Next in our series of Why I Love Big Data is Bruce from MapR. Together, Cisco and MapR are working on a very cool solution for keeping data local, but accessing very quickly. Also, come by the Connected Banking stand in the Cisco Live World of Solutions and DevNet area to see a demo of the distributed system. You will see how Cisco and MapR can provide solutions for security and data theft prevention to prevent theft of customer’s personal data and financial information.
Bruce Penn, Principal Solution Architect, MapR Technologies
Bruce is a Principal Solution Architect with MapR Technologies. He has over 22 years of Information Technology experience that includes Data Warehousing, Business Intelligence, Enterprise Architecture, Systems Design, Project Management and Application Programming. Prior to MapR, Bruce spent 8.5 years at Oracle and was instrumental in helping grow the Oracle Exadata Database Machine business through extensive collaboration with several large enterprise customers. Bruce was the first Solution Architect to join MapR’s Sales Engineering team and has been solely focused on the MapR Distribution for Hadoop and associated Apache Hadoop ecosystem technologies ever since. Bruce holds a Bachelor’s Degree in Electrical Engineering from Michigan State University.
Cisco and MapR have long been partners in the big data market, and with enterprises embracing the Internet of Everything (IoE) and moving towards a truly distributed data center environment, the combination of UCS and MapR provide unique capabilities to simplify this architecture.
Cisco UCS servers provide a powerful foundation for running distributed big data/Hadoop MapR clusters with unparalleled performance, availability, and manageability at the hardware level. The MapR Distribution including Apache Hadoop provides similar robustness at the software level, creating a rock-solid distributed platform for many flavors of IoE applications.
With the advent of IoE applications, data often originates at the “edge” of a system’s network, meaning that devices such as routers and switches in one data center will generate log data locally, while devices in other data centers will do the same creating silos of log data. In order for applications built around this log data to react in real time, they need to access that data as quickly as possible, and often those applications will want to aggregate the data across data centers in order to make decisions quickly, while keeping the data local to the originating data center. It may be important to keep the data local for legal and regulatory reasons, as well as for efficient local queries. With Cisco UCS Servers, MapR Data Placement Control, and Apache Drill, this becomes a simple task.
MapR Data Placement Control allows for the ability to group a set of servers together in what’s called a Toplogy, and then to place specific Volumes of data on those servers. This allows organizations to create a distributed big data/MapR cluster in which a Topology can exist in one data center, while another exists in a separate data center. Topologies are part of the same cluster and file system, making it possible to query and aggregate data across both with standard ANSI SQL using Apache Drill. This capability is unique to MapR and is not available in other Hadoop distributions. For example, the below image depicts a distributed cluster in which an IoE application collects log data in a data center in Amsterdam, whereas user data associated with the log data resides in a central data center in Texas:
As you can see below in the MapR Control System, both of these Toplogies appear as one cluster in which five servers reside in Amsterdam and five nodes reside in Texas, but still act as one unified cluster.
With the ability to place the user Volume on the Texas Toplogy, and the log Volume on the Amsterdam Topology, it’s trivial to keep the data local. And if Disaster Recovery is required, MapR Local Mirrors makes it easy to replicate the data from one data center to another for continued data access in case of an outage. Using Apache Drill, a user can simply run an SQL Statement that joins the log data from Amsterdam and the user data from Texas without the need to collocate the data in the same data center, providing a truly distributed platform.
With the ability to “Store Locally, Query Everywhere” the combination of Cisco UCS servers and the MapR Distribution including Apache Hadoop comprise a formidable combination for powering distributed systems across geographies. And if you are able to make Cisco Live! 2015 in San Diego, please visit the Connected Banking area in World of Solutions and DevNet area to see a demo of the distributed system.