I recently attended the Strata + Hadoop World Conference in San Jose, and came away impressed with the accelerating pace of innovation in the world of Big Data. Companies and startups are innovating in every area of the Big Data value chain – from automating how data is collected, cleaned, and organized; to data governance and management; to data storage using a plethora of NoSQL database technologies; and to the numerous emerging tools for data science. Read More »
This week, I’ll join my Cisco colleagues and industry peers at Strata + Hadoop World in San Jose. Participating in conferences such as this is one of my favorite parts of my job, because it gives us an opportunity as an industry to share information, learn from each other, and tackle challenges collectively with creative Data and Analytics solutions.
Cisco created an Analytics 3.0 architecture that enables data and analytics solutions in the Data Center, the Cloud, and at the network edge, and has made substantial investments in each of these areas as a company. As we have the opportunity to meet and collaborate at Strata + Hadoop World, the Cisco team can tell you all about our substantial investments in these areas. More importantly, you will hear about how Cisco is delivering solutions in partnership with innovative companies who are leaders in big data, analytics and business intelligence.
Speaking of innovative partnerships, today, I am excited to share the announcement of a joint Data Warehouse Optimization solution with Informatica. The solution provides a single platform for offloading processing and storage from data warehouses to Hadoop and enables organizations the ability to integrate and analyze more data and types of data. If you are attending the conference this week, I encourage you to visit the Cisco booth (#831) to hear more about this exciting new solution.
By bringing the best software, hardware and services from Cisco together with innovative and market leading capabilities of our partners, Cisco is enabling powerful solutions to the very real data problems our customers are facing. Data Virtualization is a key part of Analytics 3.0, because it allows you to connect multiple different data sources, make all the data appear as if it’s all in one spot, and serve it up with a consistent shape and format to an application and eventually to an end user. Take data from traditional data warehouses, Hadoop clusters, lots of edge places and make it all look to an application like its sitting in the data center in one central data base. This also saves application developers from re-writing applications to take advantage of data that lives at the edge. They can simply write applications as they always have and we can pull that data together wherever it lives – all across the network, in the cloud, and between clouds. Powerful on its own…even more powerful together with our partners.
The Internet of Things (IoT) was a hot topic at Cisco Live last week in Milan. I got to spend a lot of time with customers, partners, and developers, and came home impressed by the tremendous focus on IoT applications. There is an enormous amount of energy directed at building on the foundation Cisco is creating.
If you weren’t able to join us in Milan, here is my list of the week’s highlights.
The opening day keynote Read More »
The ELK stack is a set of analytics tools. Its initials represent Elasticsearch, Logstash and Kibana. Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine. Logstash is a tool for receiving, processing and outputting logs, like system logs, webserver logs, error logs, application logs and many more. Kibana is an open source (Apache-licensed), browser-based analytics and search dashboard for Elasticsearch.
ELK is a very open source, useful and efficient analytics platform, and we wanted to use it to consume flow analytics from a network. The reason we chose to go with ELK is that it can efficiently handle lots of data and it is open source and highly customizable for the user’s needs. The flows were exported by various hardware and virtual infrastructure devices in NetFlow v5 format. Then Logstash was responsible for processing and storing them in Elasticsearch. Kibana, in turn, was responsible for reporting on the data. Given that there were no complete guides on how to use NetFlow with ELK, below we present a step-by-step guide on how to set up ELK from scratch and enabled it to consume and display NetFlow v5 information. Readers should note that ELK includes more tools, like Shield and Marvel, that are used for security and Elasticsearch monitoring, but their use falls outside the scope of this guide.
In our setup, we used
- Elasticsearch 1.3.4
- Logstash 1.4.2
- Kibana 3.1.1
For our example purposes, we only deployed one node responsible for collecting and indexing data. We did not use multiple nodes in our Elasticsearch cluster. We used a single-node cluster. Experienced users could leverage Kibana to consume data from multiple Elasticsearch nodes. Elasticsearch, Logstash and Kibana were all running in our Ubuntu 14.04 server with IP address 10.0.1.33. For more information on clusters, nodes and shard refer to the Elasticsearch guide.
The Internet of Everything (IoE) is disrupting innovation models and causing market shifts. One of the most powerful IoE-driven opportunities will be the value created from big data and analytics. As IoE gains momentum and creates billions of new connections, each of those connections will be capable of producing data. The enterprises that can unlock the intelligence within that data — quickly and effectively — will hold the key to a powerful and sustainable competitive edge. Read More »