Want to get the most out of your big data? Build an enterprise data hub (EDH).
Big data is rapidly getting bigger. That in itself isn’t a problem. The issue is what Gartner analyst Doug Laney describes as the three Vs of Big Data: volume, velocity, and variety.
Volume refers to the ever-growing amount of data being collected. Velocity is the speed at which the data is being produced and moved through the enterprise information systems. Variety refers to the fact that we’re gathering information from multiple data sources such as sensors, enterprise resource planning (ERP) systems, e-commerce transactions, log files, supply chain info, social media feeds, and the list goes on.
Data warehouses weren’t made to handle this fast-flowing stream of wildly dissimilar data. Using them for this purpose has led to resource-draining, sluggish response times as workers attempt to perform numerous extract, load, and transform (ELT) functions to make stored data accessible and usable for the task at hand.
Constructing Your Hub
An EDH addresses this problem. It serves as a central platform that enables organizations to collect structured, unstructured, and semi-structured data from slews of sources, process it quickly, and make it available throughout the enterprise.
Building an EDH begins with selecting the right technology in three key areas: infrastructure, a foundational system to drive EDH applications, and the data integration platform. Obviously, you want to choose solutions that fit your needs today and allow for future growth. You’ll also want to ensure they are tested and validated to work well together and with your existing technology ecosystem. In this post, we’ll focus on selecting the right hardware.
The Infrastructure Component
Big data deployments must be able to handle continued growth, from both a data and user load perspective. Therefore, the underlying hardware must be architected to run efficiently as a scalable cluster. Important features such as the integration of compute and network, unified management, and fast provisioning all contribute to an elastic, cloud-like infrastructure that’s required for big data workloads. No longer is it satisfactory to stand up independent new applications that result in new silos. Instead, you should plan for a common and consistent architecture to meet all of your workload requirements.
Big data workloads represent a relatively new model for most data centers, but that doesn’t mean best practices must change. Handling a big data workload should be viewed from the same lens as deployments of traditional enterprise applications. As always, you want to standardize on reference architectures, optimize your spending, provision new servers quickly and consistently, and meet the performance requirements of your end users.
Cisco Unified Computing System to Run Your EDH
The Cisco Unified Computing System™ (Cisco UCS®) Integrated Infrastructure for Big Data delivers a highly scalable platform that is proven for enterprise applications like Oracle, SAP, and Microsoft. It also provides the same required enterprise-class capabilities–performance, advanced monitoring, simplification of management, QoS guarantees–to big data workloads. With lower switch and cabling infrastructure costs, lower power consumption, and lower cooling requirements, you can realize a 30 percent reduction in total cost of ownership. In addition, with its service profiles, you get fast and consistent time-to-value by leveraging provisioning templates to instantly set up a new cluster or add many new nodes to an existing cluster.
And when deploying an EDH, the MapR Distribution including Apache™ Hadoop® is especially well-suited to take advantage of the compute and I/O bandwidth of Cisco UCS. Cisco and MapR have been working together for the past 2 years and have developed Cisco-validated design guides to provide customers the most value for their IT expenditures.
Cisco UCS for Big Data comes in optimized power/performance-based configurations, all of which are tested with the leading big data software distributions. You can customize these configurations further, or use the system as is. Utilizing one of Cisco UCS for Big Data’s pre-configured options goes a long way to ensuring a stress-free deployment. All Cisco UCS solutions also provide a single point of control for managing all computing, networking, and storage resources, for any fine tuning you may do before deployment or as your hub evolves in the future.
I encourage you to check out the latest Gartner video to hear Satinder Sethi, our VP of Data Center Solutions Engineering and UCS Product Management, share his perspective on how powering your infrastructure is an important component of building an enterprise data hub.
In addition, you can read the MapR Blog, Building an Enterprise Data Hub, Choosing the Foundational Software.
Let me know if you have any comments or questions, or via twitter at @CicconeScott.
Tags: Big Data, blade server, blades servers, C240 M3 Rack Server, Cisco UCS, Cisco Unified Computing System, Cisco Unified Data Center, Cisco Unified Fabric, Enterprise Data Hub, Gartner, Hadoop, MapR, rack server, UCS Central, UCS service profiles
The ELK stack is a set of analytics tools. Its initials represent Elasticsearch, Logstash and Kibana. Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine. Logstash is a tool for receiving, processing and outputting logs, like system logs, webserver logs, error logs, application logs and many more. Kibana is an open source (Apache-licensed), browser-based analytics and search dashboard for Elasticsearch.
ELK is a very open source, useful and efficient analytics platform, and we wanted to use it to consume flow analytics from a network. The reason we chose to go with ELK is that it can efficiently handle lots of data and it is open source and highly customizable for the user’s needs. The flows were exported by various hardware and virtual infrastructure devices in NetFlow v5 format. Then Logstash was responsible for processing and storing them in Elasticsearch. Kibana, in turn, was responsible for reporting on the data. Given that there were no complete guides on how to use NetFlow with ELK, below we present a step-by-step guide on how to set up ELK from scratch and enabled it to consume and display NetFlow v5 information. Readers should note that ELK includes more tools, like Shield and Marvel, that are used for security and Elasticsearch monitoring, but their use falls outside the scope of this guide.
In our setup, we used
- Elasticsearch 1.3.4
- Logstash 1.4.2
- Kibana 3.1.1
For our example purposes, we only deployed one node responsible for collecting and indexing data. We did not use multiple nodes in our Elasticsearch cluster. We used a single-node cluster. Experienced users could leverage Kibana to consume data from multiple Elasticsearch nodes. Elasticsearch, Logstash and Kibana were all running in our Ubuntu 14.04 server with IP address 10.0.1.33. For more information on clusters, nodes and shard refer to the Elasticsearch guide.
Read More »
Tags: Big Data, big data analytics, netflow, security
Connecting Dark Assets: An ongoing series on how the Internet of Everything is transforming the ways in which we live, work, play, and learn.
Racing down the wide, open highway on a beautifully crafted motorcycle is one of life’s most exhilarating rushes. At least I used to think so, before my wife talked me into taking up safer pastimes.
But Internet of Everything (IoE) technologies may be offering me a new lease on motorcycling. A new product called the Skully AR-1 is being billed as “The World’s Smartest Motorcycle Helmet.” And who am I to argue? Read More »
Tags: Big Data, Cisco, Cisco Consulting Services, employee productivity, Internet of Everything, IoT, job creation, Joseph Bradley
According to the Breach Level Index, between July and September of this year, an average of 23 data records were lost or stolen every second – close to two million records every day.1 This data loss will continue as attackers become increasingly sophisticated in their attacks. Given this stark reality, we can no longer rely on traditional means of threat detection. Technically advanced attackers often leave behind clue-based evidence of their activities, but uncovering them usually involves filtering through mountains of logs and telemetry. The application of big data analytics to this problem has become a necessity.
To help organizations leverage big data in their security strategy, we are announcing the availability of an open source security analytics framework: OpenSOC. The OpenSOC framework helps organizations make big data part of their technical security strategy by providing a platform for the application of anomaly detection and incident forensics to the data loss problem. By integrating numerous elements of the Hadoop ecosystem such as Storm, Kafka, and Elasticsearch, OpenSOC provides a scalable platform incorporating capabilities such as full-packet capture indexing, storage, data enrichment, stream processing, batch processing, real-time search, and telemetry aggregation. It also provides a centralized platform to effectively enable security analysts to rapidly detect and respond to advanced security threats.
The OpenSOC framework provides three key elements for security analytics:
A mechanism to capture, store, and normalize any type of security telemetry at extremely high rates. OpenSOC ingests data and pushes it to various processing units for advanced computation and analytics, providing the necessary context for security protection and the ability for efficient information storage. It provides visibility and the information required for successful investigation, remediation, and forensic work.
Real-time processing and application of enrichments such as threat intelligence, geolocation, and DNS information to collected telemetry. The immediate application of this information to incoming telemetry provides the greater context and situational awareness critical for detailed and timely investigations.
The interface presents alert summaries with threat intelligence and enrichment data specific to an alert on a single page. The advanced search capabilities and full packet-extraction tools are available for investigation without the need to pivot between multiple tools.
During a breach, sensitive customer information and intellectual property is compromised, putting the company’s reputation, resources, and intellectual property at risk. Quickly identifying and resolving the issue is critical, but, traditional approaches to security incident investigation can be time-consuming. An analyst may need to take the following steps:
- Review reports from a Security Incident and Event Manager (SIEM) and run batch queries on other telemetry sources for additional context.
- Research external threat intelligence sources to uncover proactive warnings to potential attacks.
- Research a network forensics tool with full packet capture and historical records in order to determine context.
Apart from having to access several tools and information sets, the act of searching and analyzing the amount of data collected can take minutes to hours using traditional techniques.
When we built OpenSOC, one of our goals was to bring all of these pieces together into a single platform. Analysts can use a single tool to navigate data with narrowed focus instead of wasting precious time trying to make sense of mountains of unstructured data.
No network is created equal. Telemetry sources differ in every organization. The amount of telemetry that must be collected and stored in order to provide enough historical context also depends on the amount of data flowing through the network. Furthermore, relevant threat intelligence differs for each and every individual organization.
As an open source solution, OpenSOC opens the door for any organization to create an incident detection tool specific to their needs. The framework is highly extensible: any organization can customize their incident investigation process. It can be tailored to ingest and view any type of telemetry, whether it is for specialized medical equipment or custom-built point of sale devices. By leveraging Hadoop, OpenSOC also has the foundational building blocks to horizontally scale the amount of data it collects, stores, and analyzes based on the needs of the network. OpenSOC will continually evolve and innovate, vastly improving organizations’ ability to handle security incident response.
We look forward to seeing the OpenSOC framework evolving in the open source community. For more information and to contribute to the OpenSOC community, please visit the community website at http://opensoc.github.io/.
Tags: analytics, Big Data, data loss, detection, OpenSOC
The Internet of Everything continues to gain momentum and every new connection is creating new data. Cisco UCS Integrated Infrastructure for Big Data is helping customers convert that data into powerful intelligence, and we’re working with a number of new partners to bring exciting new solutions to our customers.
Today, I want to spotlight Elasticsearch, Inc. and welcome them to the Cisco Solution Partner Program.
Elasticsearch excels at providing real-time insight into data – whether structured or unstructured, human- or machine-generated; by bringing a search-based architecture to data analytics. By combining the ELK stack with Cisco UCS, organizations benefit from a turnkey underlying infrastructure solution that provides them with real-time search and analytics for a variety of applications, from log analysis, to structured, semi-structured, or unstructured searches, as well as a web-backend for custom applications that use search-based analytics as a core functionality.
Mozilla is just one of the companies who are already benefiting from the joint solution with real-time search and analysis of data powering its defense platform, MozDef. The ELK stack leverages Cisco UCS’ fast connectivity for query, indexing and replication of data traffic. And Elasticsearch handles the full scale of event storage, archiving, indexing and searching of the data logs. The ELK stack and Cisco UCS also protect Mozilla’s network, services, systems, and audit data from hackers.
Partners like Elasticsearch are just one reason that Cisco UCS Integrated Infrastructure can help your company capitalize on the IoE data avalanche and deliver powerful and cost-effective analytics solutions throughout your enterprise.
Find out more at www.cisco.com/go/bigdata, or register for a webinar entitled, “Learn How Mozilla Tackles their Security Logs with Elasticsearch and Cisco”.
Thursday, November 13th
9:00 AM PST / 12:00 PM EST / 5:00 PM GMT
Are you interested in learning how to build enterprise applications on top of Elasticsearch and Cisco’s Unified Computing System (UCS) infrastructure? We’re holding a webinar to delve more deeply into how to optimize ELK on Cisco UCS infrastructure.
Cisco UCS unites compute, network, and storage access into a single cohesive system. By combining the ELK stack with Cisco UCS, businesses benefit by having a turnkey hardware-software solution for their search and analytics applications. In this webinar you’ll learn about the various UCS hardware profiles you should consider when deploying ELK and how Mozilla built MozDef, their custom SIEM application, using ELK on Cisco UCS.
- Introduction – Jobi George, Elasticsearch (5 minutes)
- Overview of UCS + ELK reference architectures – Raghunath Nambiar, Distinguished Engineer, Data Center Business Group, Cisco (10 minutes)
- How Mozilla Built MozDef on ELK and Cisco UCS – Jeff Bryner, Security Assurance, Mozilla (25 minutes)
- Q&A – Jobi George, Elasticsearch (~20 minutes)
Tags: analytics, Big Data, Cisco, Cisco UCS, Cisco Unified Computing System, elasticsearch, UCS