Big Security—Mining Mountains of Log Data to Find Bad Stuff
Your network, servers, and a horde of laptops have been hacked. You might suspect it, or you might think it’s not possible, but it’s happened already. What’s your next move?
The dilemma of the “next move” is that you can only discover an attack either as it’s happening, or after it’s already happened. In most cases, it’s the latter, which justifies the need for a computer security incident response team (CSIRT). Brandon Enright, Matthew Valites, myself, and many other security professionals constitute Cisco’s CSIRT. We’re the team that gets called in to investigate security incidents for Cisco. We help architect monitoring solutions and strategies and enable the rest of our team to discover security incidents as soon as possible. We are responsible for monitoring the network and responding to incidents discovered both internally by our systems or reported to us externally via firstname.lastname@example.org.
Securing and monitoring a giant multinational high-speed network can be quite a challenge. Volume and diversity, not complexity, are our primary enemies when it comes to incident response. We index close to a terabyte of log data per day across Cisco, along with processing billions of NetFlow records, millions of intrusion detection alarms, and millions of host security log records. This doesn’t even include the much larger data store of authentication and authorization data for thousands of people. Naturally, like all large corporations, dedicated attackers, hacking collectives, hacktivists, and typical malware/crimeware affect Cisco. Combine these threats with internally sourced security issues, and we’ve got plenty of work cut out for us.
We’ve come a long way as a CSIRT team and have grown rapidly to address the threats of the times. We’re certainly in a new era, and our old ways, including heavy reliance on a security information and event management system (SIEM), have faded. There is simply too much data, too many custom and diverse threats, and too much at risk for us to depend on what we discovered as an inflexible, expensive, and underperforming solution. It’s very true that with a SIEM, you get what you put into it, and despite much effort to make it work better, its efficacy over time went down along with our detection rates.
In this blog series, we will expound upon our methods for doing incident response. We believe that we have developed some highly effective techniques, based on metadata analysis and data organization, that have led us to incident response gold. We’ll discuss how we moved away from a SIEM to a log management and searching solution, and how we solved the problem of trying to analyze too much data by optimizing our processes and creatively solving problems.
We refer to our collection of repeatable queries (reports) against security event data sources that lead to incident detection and response as the “playbook.” Our goal is to educate our industry peers and those with an interest towards network defense on how to develop their own monitoring and response strategies and playbook.
In the series we’ll discuss the following:
- How we developed a new playbook unburdened by outdated procedures and methodology
- How to manage and search through security event data across disparate log source
- “Big Data” mining through intelligent and reductive searching on any platform
- Guiding incident response and security teams to evolve to more efficient and productive operations
- How to use metadata to help discover security incidents
We’ll focus a bit on technology and security event sources, but in general we’re really focused on how to get the most out of your investments in security monitoring technology. In our experience, the tools generally only provide a flow of data. It is really up to human brains to convert that raw data into information, and eventually knowledge.
Data alone is just that – data. It’s a bit of detail, like an IP address, a timestamp, a login failure message, or any element that makes up a component of a log file. However, it’s not really information until it has been combined with context. Comparing organized data and building context creates security knowledge. In the log analysis world, context is truly king.
When the log data is massive, leveraging metadata to retrieve understandable and usable data is the key to effective log analysis. Seeing an IP address in a log message doesn’t mean anything by itself – it is just a piece of data. Applying knowledge to that same data transforms it into information. If I know an IP address belongs to a small subnet that does nothing but serve corporate email, my perspective changes on the log event I’m investigating. The IP address is no longer just a bit of data. Now we have information that it belongs to a functional category (email servers). Any relevant information we have, or can collect, on the host is essentially the metadata we can query with. In this case, “email servers” is the collection of data elements described by a common term – usable metadata. Knowing the function and purpose of a host, we can assess the impact of the log message in our investigation.
For example, if our monitoring systems indicated a surge in outbound SMTP traffic from a previously unseen host, we would investigate the log event. If we determine and verify the source host is a legitimate email server (based on our information about the data), then we would filter the event from occurring or generating alarms again. However, if we were unable to tie that log event to some information (i.e., we have no knowledge of that IP address’s purpose) then we might consider that host potentially compromised and abused for sending spam.
What we hope to communicate is that the little bytes of data you have floating in your log files can be ordered, structured, and compared to each other to yield usable information. Once you let the computers do all the calculating, storing, sorting, indexing, statistics, and visualization, let the humans in to start critically applying knowledge and research to log data so you can build information that will help your team discover and address security incidents.