Cisco Blogs


Cisco Blog > Security

Step-by-Step Setup of ELK for NetFlow Analytics

Contents

 

 

Intro

 

The ELK stack is a set of analytics tools. Its initials represent Elasticsearch, Logstash and Kibana. Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine. Logstash is a tool for receiving, processing and outputting logs, like system logs, webserver logs, error logs, application logs and many more. Kibana is an open source (Apache-licensed), browser-based analytics and search dashboard for Elasticsearch.

ELK is a very open source, useful and efficient analytics platform, and we wanted to use it to consume flow analytics from a network. The reason we chose to go with ELK is that it can efficiently handle lots of data and it is open source and highly customizable for the user’s needs. The flows were exported by various hardware and virtual infrastructure devices in NetFlow v5 format. Then Logstash was responsible for processing and storing them in Elasticsearch. Kibana, in turn, was responsible for reporting on the data. Given that there were no complete guides on how to use NetFlow with ELK, below we present a step-by-step guide on how to set up ELK from scratch and enabled it to consume and display NetFlow v5 information. Readers should note that ELK includes more tools, like Shield and Marvel, that are used for security and Elasticsearch monitoring, but their use falls outside the scope of this guide.

In our setup, we used

  • Elasticsearch 1.3.4
  • Logstash 1.4.2
  • Kibana 3.1.1

For our example purposes, we only deployed one node responsible for collecting and indexing data. We did not use multiple nodes in our Elasticsearch cluster. We used a single-node cluster. Experienced users could leverage Kibana to consume data from multiple Elasticsearch nodes. Elasticsearch, Logstash and Kibana were all running in our Ubuntu 14.04 server with IP address 10.0.1.33. For more information on clusters, nodes and shard refer to the Elasticsearch guide.

Read More »

Tags: , , ,

The Big Picture of Big Data

The Internet of Everything (IoE) is disrupting innovation models and causing market shifts. One of the most powerful IoE-driven opportunities will be the value created from big data and analytics. As IoE gains momentum and creates billions of new connections, each of those connections will be capable of producing data. The enterprises that can unlock the intelligence within that data — quickly and effectively — will hold the key to a powerful and sustainable competitive edge. Read More »

Tags: , , , , , ,

Can the Elephant Dance to a Security Tune?

HadoopThere is a great debate in the security world right now: have SIEM and logging products run their course? Will Hadoop ride to the rescue? Can machines “learn” about security and reliably spot threats that no other approach can find?

Gartner calls this phenomenon Big Data Security Analytics, and they make a strong point to define BDSA solutions as a three-layer pyramid. At the bottom is the “data lake,” which is what most people equate with Hadoop. The next layer is context—the addition of relevant business, location, and other non-traditional security information to increase the precision of the next layer: applications and analytics (such as Machine Learning). It is this top layer where the real value of BDSA is realized in terms of finding new threats and remediating them before they do damage.

Read More »

Tags: , , , , ,

Paradigm Shift with Edge Intelligence

In my Internet of Things keynote at LinuxCon 2014 in Chicago last week, I touched upon a new trend: the rise of a new kind of utility or service model, the so-called IoT specific service provider model, or IoT SP for short.

I had a recent conversation with a team of physicists at the Large Hadron Collider at CERN. I told them they would be surprised to hear the new computer scientist’s talk these days, about Data Gravity.  Programmers are notorious for overloading common words, adding connotations galore, messing with meanings entrenched in our natural language.

We all laughed and then the conversation grew deeper:

  • Big data is very difficult to move around, it takes energy and time and bandwidth hence expensive. And it is growing exponentially larger at the outer edge, with tens of billions of devices producing it at an ever faster rate, from an ever increasing set of places on our planet and beyond.
  • As a consequence of the laws of physics, we know we have an impedance mismatch between the core and the edge, I coined this as the Moore-Nielsen paradigm (described in my talk as well): data gets accumulated at the edges faster than the network can push into the core.
  • Therefore big data accumulated at the edge will attract applications (little data or procedural code), so apps will move to data, not the other way around, behaving as if data has “gravity”

Therefore, the notion of a very large centralized cloud that would control the massive rise of data spewing from tens of billions of connected devices is pitched both against the laws of physics and Open Source not to mention the thirst for freedom (no vendor lock-in) and privacy (no data lock-in). The paradigm shifted, we entered the 3rd big wave (after the mainframe decentralization to client-server, which in turn centralized to cloud): the move to a highly decentralized compute model, where the intelligence is shifting to the edge, as apps come to the data, at much larger scale, machine to machine, with little or no human interface or intervention.

The age-old dilemma, do we go vertical (domain specific) or horizontal (application development or management platform) pops up again. The answer has to be based on necessity not fashion, we have to do this well; hence vertical domain knowledge is overriding. With the declining cost of computing, we finally have the technology to move to a much more scalable and empowering model, the new opportunity in our industry, the mega trend.

Very reminiscent of the early 90′s and the beginning of the ISPs era, isn’t it? This time much more vertical with deep domain knowledge: connected energy, connected manufacturing, connected cities, connected cars, connected home, safety and security.  These innovation hubs all share something in common: an Open and Interconnected model, made easy by the dramatically lower compute cost and ubiquity in open source, to overcome all barriers of adoption, including the previously weak security or privacy models predicated on a central core. We can divide and conquer, deal with data in motion, differently than we deal with data at rest.

The so-called “wheel of computer science” has completed one revolution, just as its socio-economic observation predicted, the next generation has arrived, ready to help evolve or replace its aging predecessor. Which one, or which vertical will it be first…?

Tags: , , , , , , , , , , , , , , , , , ,

My Top 7 Predictions for Open Source in 2014

My 2014 predictions are finally complete.  If Open Source equals collaboration or credibility, 2013 has been nothing short of spectacular.  As an eternal optimist, I believe 2014 will be even better:

  1. Big data’s biggest play will be in meatspace, not cyberspace.  There is just so much data we produce and give away, great opportunity for analytics in the real world.
  2. Privacy and security will become ever more important, particularly using Open Source, not closed. Paradoxically, this is actually good news as Open Source shows us again, transparency wins and just as we see in biological systems, the most robust mechanisms do so with fewer secrets than we think.
  3. The rise of “fog” computing as a consequence of the Internet of Things (IoT) will unfortunately be driven by fashion for now (wearable computers), it will make us think again what have we done to give up our data and start reading #1 and #2 above with a different and more open mind. Again!
  4. Virtualization will enter the biggest year yet in networking.  Just like the hypervisor rode Moore’s Law in server virtualization and found a neat application in #2 above, a different breed of projects like OpenDaylight will emerge. But the drama is a bit more challenging because the network scales very differently than CPU and memory, it is a much more challenging problem. Thus, networking vendors embracing Open Source may fare well.
  5. Those that didn’t quite “get” Open Source as the ultimate development model will re-discover it as Inner Source (ACM, April 1999), as the only long-term viable development model.  Or so they think, as the glamor of new-style Open Source projects (OpenStack, OpenDaylight, AllSeen) with big budgets, big marketing, big drama, may in fact be too seductive.  Only those that truly understand the two key things that make an Open Source project successful will endure.
  6. AI recently morphed will make a comeback, not just robotics, but something different AI did not anticipate a generation ago, something one calls cognitive computing, perhaps indeed the third era in computing!  The story of Watson going beyond obliterating Jeopardy contestants, looking to open up and find commercial applications, is a truly remarkable thing to observe in our lifespan.  This may in fact be a much more noble use of big data analytics (and other key Open Source projects) than #1 above. But can it exist without it?
  7. Finally, Gen Z developers discover Open Source and embrace it just like their Millennials (Gen Y) predecessors. The level of sophistication and interaction rises and projects ranging from Bitcoin to qCraft become intriguing, presenting a different kind of challenge.  More importantly, the previous generation can now begin to relax knowing the gap is closing, the ultimate development model is in good hands, and can begin to give back more than ever before. Ah, the beauty of Open Source…

Tags: , , , , , , , , , , , , , , , , , , , , , , ,