The Great Correlate Debate
SIEMs have been pitched in the past as “correlation engines” and their special algorithms can take in volumes of logs and filter everything down to just the good stuff. In its most basic form, correlation is a mathematical, statistical, or logical relationship between a set of different events. Correlation is incredibly important, and is a very powerful method for confirming details of a security incident. Correlation helps shake out circumstantial evidence, which is completely fair to use in the incident response game. Noticing one alarm from one host can certainly be compelling evidence, but in many cases it’s not sufficient. Let’s say my web proxy logs indicate a host on the network was a possible victim of a drive-by download attack. The SIEM could notify the analysts team that this issue occurred, but what do we really know at this point? That some host may have downloaded a complete file from a bad host – that’s it. We don’t know if it has been unpacked, executed, etc. and have no idea if the threat is still relevant. If the antivirus deleted or otherwise quarantined the file, do we still have anything to worry about? If the proxy blocked the file from downloading, what does that mean for this incident?
This is the problem that correlation can solve. If after the malware file downloaded we see port scanning behavior, large outbound netflow to unusual servers, repeated connections to PHP scripts hosted in sketchy places, or other suspicious activity from the same host, we can create an incident for the host based on our additional details. The order is important as well. Since most attacks follow the same pattern (bait, redirect, exploit, additional malware delivery, check-in), we tie these steps together with security alarms and timestamps. If we see the events happening in the proper order we can be assured an incident has occurred.
I’m sorry Dave, I’m afraid I can’t do that
The statistical and mathematical correlation determinations are best handled by the SIEM. However, the logical correlation can really only be done well by a human brain. As an example, the SIEM might fire many alerts indicating an SSH brute force attack is occurring. It’s getting thousands of alarms from the network IDS and from UNIX hosts indicating multiple rapid connections to the SSH daemon. To the SIEM and the underlying security monitoring tools, these events look really bad – they are great indicators that something nefarious is happening, and that something is attempting to gain unauthorized access to SSH services. However, the human analyst might know that these events are part of an expected audit and the source, or attacking, host actually belongs to a sanctioned auditor performing SSH sweeps at their scheduled time. The SIEM is taking the IDS alarms and the host indicators and accurately informing that there’s a security event taking place, but it has no way to know or understand that its not truly a security event. These types of events are common and are just part of being a SIEM analyst.
I have been involved in numerous investigations where an IT team uses a script to SCP files between a server and thousands of clients. In most cases, they do a quick TCP connection sweep to see if their hosts are up and whether SSH is listening before they attempt to copy files. This behavior is of course totally normal and acceptable for a sysadmin and their management systems; however, from the perspective of the security devices and the SIEM, this is anomalous and possibly malicious. The human brain is capable of “correlating” this activity with the underlying reason. This turns a possible incident into just an event.
On the converse, the human analyst will never be able to notice a single 1000 byte HTTP connection, randomly, once a week, to a web server sitting on a massive hosting farm and call it suspicious. They will also not be able to remember the typical threshold of SMTP traffic from a DMZ-facing email server, and thus may not react properly if the traffic spikes out of normal, but not extreme enough to cause an availability issue or performance alarm. This could be indicative of an compromised internal host spamming the Internet, or it could be external abuse of the mail server. The SIEM must be able to perform these functions; however, it can be performance heavy to keep a state table or equivalent of event data long enough to create a meaningful alarm. The custom reporting interface needs to be flexible enough and precise enough to allow you to configure the report, or the built-in reports must be capable of handling the volume and report scheduling requirements.
All of this assumes that you have informed the SIEM about your network. You have told it that 10.23.20.0/25 belongs to the production virtual desktop farm, that 192.168.45.1-19 belong to your internal email cluster, and that anything behind the intrusion detection sensor named raleigh-dc-ips-7 is a datacenter, and therefore more valuable (critical) than other network segments. You have repeated this exercise to carve up the whole network. This is a requirement for intra-network traffic correlation to work. Traffic flow from a critical datacenter server to the Internet and unauthorized destinations can only be alarmed by the SIEM if it knows about these designations and is capable of keeping track of traffic directionality.
Remember that when the SIEM code was certified to ship to customers, they had no idea what your network looked like. The software doesn’t know what industry you are in, what types of server population and diversity you have, what your network demarcation lines are, or anything about the roles of users in the network. The vendor has to take a broad approach that will show value wherever their product is deployed, or they would not be able to sell more SIEM. There’s no doubt the SIEM will find malware on your network once it’s configured and deployed, but it’s up for debate as to whether it can find specific and valuable events unique to your network. The biggest question to ask yourself when making a SIEM deployment decision is, how does this technology help me find incidents I would not otherwise be able to find with my existing tools.
The Alternative to the SIEM
So a SIEM purports to solve the problem of “correlating” event data across disparate log sources to produce valuable incident data. As described though, it clearly takes a gargantuan effort to ensure this investment works, and a heavy reliance on system performance and proper configuration. Although system and performance issues affect every type of incident detection system, the static logic, operational complexity, and limited custom searching are the primary downfalls of the SIEM. A security log management system however, affords the ability for highly flexible and precision searching and offers a completely tailored solution.
Most of the potential issues, gotchas, and requirements mentioned above would also apply to a security log management system. However, once properly architected, deployed, and manicured, a security log management system can be the most effective and precise tool in the incident detection toolkit, if only because of its searching and indexing capabilities.
Alarm volume is an area where security log management one-ups the SIEM. Let’s say that the SIEM is processing thousands of host security alarms every day. Many host intrusion prevention (HIPS) tools log events whenever protected directories or registries are accessed by other files. This type of activity may be indicative of software installation, but it’s not always obvious whether the installation is intentional and benign, or a malicious attack. HIPS alarms will take up an enormous amount of space in the SIEM, and if it’s not easy to determine whether they indicate a true attack versus typical system behavior, incidents will go undetected and search performance will be lost. However, if you know precisely what to look for, either via some internally gathered intelligence or a tip from an external source, you can narrow down your focus and hunt only for the events with a high confidence level. The ability to pre-filter log data for irrelevant events is also available in a log management solution, which provides the ultimate flexibility in distilling the log data into exactly what you need.
There’s another bright spot for security log management versus putting everything into the SIEM and analyzing its reports. For incident detection there are essentially two methods for finding bad stuff: hunting or gathering. Log management is the best way to provide these capabilities out of the box.
Let’s start with gathering. Gathering is what the SIEM does best. It takes events from various security systems, runs the events through some normalization and detection algorithms, and produces summarized reports from any resultant alarms. The analysts collect the report results and takes the appropriate action. In this manner of detection, the incident response team is discovering incidents within their event data using logic or intelligence from pre-existing threats – i.e., the SIEM is finding incidents based on some piece or pieces of information already known. Gathering is great for finding the most common types of security problems. A Conficker infection, Zeus bot call-home traffic, or noisy worms are best found through the gathering approach. In a perfect world, all incidents would be detected automatically and delivered to the analyst for incident handling. However, this isn’t sufficient to truly protect the organization. Infosec O.G. and accidental father of the CSIRT Cliff Stoll didn’t track Markus Hess’ attacks against the Lawrence Berkeley National Laboratory using a SIEM or an automated alarm. A log file indicated an anomalous accounting error that Stoll subsequently investigated. Once he uncovered more details of the source (by hunting) in the available logs, he was able to set a trap and detect Hess in real time (gathering).
The Hunt Is On
Stoll had to dig through the log data from the hosts he managed to find some clue about where the account error was from. Building upon those clues in his investigation, he was able to determine where the attacker was coming in, what they had done, and how long they had been present. His efforts to detect follow-up attacks were dedicated, although largely a result of the technology of the time. If Cliff had a massively indexed log management system with the ability to rapidly query across terabytes of data at a time, he could have set up a basic query that would send an email to him or his pager when the intruder re-entered the network. Surely this is preferable to sleeping in a datacenter waiting for a teleprinter to start grinding out human-readable characters!
This method is as valid today as it was back in 1986, albeit with better technology. A SIEM will inform you about categorical attacks you’ve configured it to look for, or what it’s learned from its vendor’s updates. But what about a targeted attack against your organization? What if your CEO and CFO are targets of directed phishing or attacks against their computing resources? If it was crafted to look like typical company communications, or didn’t trip any of the known malware signatures in host-based software or in any of the SIEM alarms, how will you find it? What if the threat is new and there are no updated detection patterns available in the SIEM? The vast majority of cyber criminals are interested in their volume of “loads” (how many victims they can herd) and making money. It’s not important to them that a PC belongs to a CEO, a software engineer, or an accountant – what matters is that they have achieved their “load” and subsequently receive financial rewards for spamming, clickjacking, or just holding their territory to sell to other criminals. For these types, SIEM is great at finding typical exploit kit based attacks and backdoors. However, for precision attacks aimed at individual people or systems, the SIEM will have a hard time keeping up with the threats.
There have been plenty of cases where our team has picked up on infected systems as a result of some bit of intel from Twitter, a mailing list, IRC, or other forum. The ability to query against some extremely recent detail like an IP address, hostname, URL, or regular expression for an upcoming threat is invaluable. It’s great to have reports already completed and infected systems cleaned up before news breaks in larger outlets of a new threat. The OSX Flashback trojan was a prime example of this – based on a tweet, we developed a precision query to detect infected systems. We had detected infections and remediated the threat within an hour of receiving the tweet, while news reporting outlets didn’t make it widely known until the following day. Reading blogs and other security resources is an important part of the job and has resulted in some amazing successes over the years. However, not until we moved to a log management system were we able to deploy a query or a report and get results so quickly. We’ve even developed frameworks for automatic detection and alerting by simply passing detection patterns directly to our log management system directly from other information sources.
Don’t Take Our Word for It
While security intelligence and research are indispensable components of a proper incident response strategy, there are plenty of other methods for discovering “bad stuff” on the network using a log management system. As it’s common for attack campaigns to exploit current events in their phishing attempts, you could query email subject logs for recent news events (the royal wedding was a popular one, as well as the notorious Storm worm) and then see what attachments were also sent, or what URLs may be in the message body. Or you could query to see what URLs a victim visited after opening the attachment or reading the email. This would help you determine any other internal hosts that could be compromised.
When you are aware of a nasty new 0day vulnerability in, oh let’s say a browser plugin, you can look to see when that browser plugin (as an HTTP user-agent) starts acting funny. If you are expecting a new round of client-side vulnerability exploits based on published update bulletins, you may want to look for indicators. You can profile traffic depending on its destination (i.e., dynamic DNS providers could be good or bad, but you might want to know who’s visiting them). There are also fundamental things you shouldn’t see happening on a host or on the network (registry edits, strange setuid changes, impossible packets, etc.) that can be useful indicators. Hunting always boils down to looking for a ‘sign’ of suspicious activity and then ‘tracking’ based on any indicator data you may find.
The important take away is that you can pay other people for security intelligence, or you can pay your own people to do it while ensuring the data is relevant to your organization and its threat tolerance. Although better technology and log management techniques have improved incident detection rates, the human factor of incident response cannot be understated or replaced. You should be able to translate an English (or other verbal language) sentence into a detection technique or query and get back results promptly. The best detection and reporting system is one that you can get usable information from based on a specific detection goal. In most cases a SIEM only comes close to this goal, but an effective log management system can provide the flexibility to return an answer to any question you could think to ask.