To SIEM or Not to SIEM? Part I
Security information and event management systems (SIEM, or sometimes SEIM) are intended to be the glue between an organization’s various security tools. Security and other event log sources export their alarms to a remote collection system like a SIEM, or display them locally for direct access and processing. It’s up to the SIEM to collect, sort, process, prioritize, store, and report the alarms to the analyst. It’s this last piece that is the key to an effective SIEM deployment, and of course the most challenging part. In the intro to this blog series I mentioned how we intend to describe our development of a new incident response playbook. A big first step in modernizing our playbook was a technology overhaul, from an outdated and inflexible technology to a modern and highly efficient one. In this two-part post, I’ll describe the pros and cons of running a SIEM, and most importantly provide details on why we believe a log management system is the superior choice.
Deploying a SIEM is a project. You can’t just rack a new box of packet-eating hardware and expect it to work. It’s important to understand and develop all the proper deployment planning steps. Things like scope, business requirements, and engineering specifications are all factors in determining the success of the SIEM project. Event and alarm volume in terms of disk usage, and retention requirements must be understood. There’s also the issue of how to reliably retrieve remote logs from a diverse group of networked devices without compatibility issues. You must be able to answer questions like:
- Will the detection be exclusively from network security devices, or will you gather host log data as well?
- Would you be able to install an agent-based tool on a critical service, like an email server, directory server, or finance system?
- What network changes (access-control lists, etc.) are affected by a SIEM log collector or exporter?
- How much storage will we need and how many resources are required to maintain it?
- Beyond all the IT resourcing, how many analysts are required to actually review the alert data?
- Is there a comprehensive and enforceable logging policy? i.e., can you be sure the sysadmins have enabled sufficient logging for SIEM export?
- What are the event/incident long-term retention requirements?
The SIEM and its backend storage must be properly sized and perform at a reasonable level. Of course the speed of result retrieval from a report or query is inversely proportionate to the size of the log data. This means that despite indexing, the more data the SIEM has to trawl through for your search, the slower it will perform. Two major factors that affect the volume of data in a search are the total size of log data coming from networked systems, and the time range of the search query. The SIEM might only be receiving IDS alarms from a dozen sensors, but if you need a report for an incident from a year ago, how much data would the SIEM have to search through, and how long would it take? For an ad-hoc query it may be acceptable to wait a while for results, but for regular reporting it’s not going to cut it.
This is why it’s also critical to understand the reporting requirements and objectives. There are many reasons security teams may choose to deploy a SIEM. You have to understand not only where and how to deploy the SIEM and its collectors, but also why.
- Security monitoring and incident response
- Compliance or regulatory mandated logging and reporting
- Hosting or network operations
- Metrics and reporting
Once you have an idea of what compelled you to undertake a SIEM deployment, it’s easier to define the reporting requirements. This information will also help in selecting features for any SIEM product you choose.
Results can be delivered in numerous ways like RSS, email, internal incident tracking, SIEM console, or external APIs. How results are delivered is as important as the interval in which they are exported. Real-time incident reporting is of course the holy grail of incident response. Well, that and always having all logs available after an incident. In any case, discovering every incident that is occurring while the analyst is watching is the ultimate goal. In practice however, it’s very difficult (although not impossible) to achieve real-time alerting. Underlying system capabilities are really the primary issue with SIEM performance; however, there are other significant time-related issues like report time synchronization, report duration, and time stamping.
If it takes thirty minutes for a report to run and deliver results, you are already thirty minutes behind real-time. If you have results delivered in real-time to an analyst, it has to take some quantity of time to process that alarm, and begin to review the next one. For common malware, much of this can and should be automated (including remediation steps like remote trigger blackhole or DNZ RPZ filtering). Response time approaching real-time is possible, although in many cases it’s sufficient to deliver lower severity alarms to analysts within a few hours of detecting an incident. For highly critical alarms, the SIEM can prioritize events based on criteria like attacker or victim IP address or alarm type and alert through a quicker mechanism than a belated email report.
Assuming this whole deployment process has been wildly successful, the next most difficult part is ensuring regular concise, actionable, and descriptive alarms. Loads of work goes into getting a SIEM solution producing regularly valuable results, and without proper and ongoing configuration and tuning, the SIEM will never be able to satisfy the alerting criteria.
Garbage In, Garbage Out
Not only do you have to have a solid understanding of your own network, its demarcation lines, and its services, but you have to ensure the SIEM knows about it all as well. This isn’t just uploading a spreadsheet or CSV file of IP addresses once. It’s only going to work if there are constant update intervals and regular alarm and report review. The care and feeding of a SIEM cannot be understated, and it should be no surprise that large, complex tools require full-time people to work with them. Running the SIEM and a large collection infrastructure must be a separate job function from analyzing the alarms as it takes truly dedicated resources to ensure the data from your network is relevant, accurate, and properly categorized within the SIEM.
It’s certainly possible to simply throw all alarms from all devices into the SIEM, but finding incidents will be difficult. Most likely you would notice results immediately from the noisiest alarms for the most common, typical malware. However once those issues have been tamped out, it can be difficult to find value in the rest of the alarms. If you have not tuned anything, there will be hundreds or thousands of alarms that may mean nothing. Do you really care that a ping sweep occurred from one of your uptime monitors? Did a long-running flow alarm trip on a large database replication? These are just a few of the many ways unnecessary alarms configured on the SIEM can clog up the analysts’ incident detection capacity. It is true that some informational alarms can be useful, particularly if they are reviewed over a time range for trends and outliers. However, a cluttered interface of alarm data is not only daunting to the analyst, but also a setup for failure as potentially valuable alarms may be lost or unnoticed in the sea of unimportant alarms.
Canned vs. Fresh
The SIEM capabilities in the incident reporting area boil down to either a pre-canned query developed by the SIEM vendor, possibly based on configuration data about your network, or custom reports. Every SIEM includes a plethora of pre-built reports that can be used to find interesting things on your network, or it can be used for boring compliance monitoring. You may find some value in the SIEM reports you paid for, and assuming there are no scaling or performance issues, a SIEM might be an OK fit for your network. However, if the SIEM has no flexible custom reporting, or doesn’t allow you to quickly receive results custom to your query, or if it does not have the absolute latest threat information, then you are likely missing important events.
If your organization has been targeted with custom malware, and you have observed its network characteristics, you must be able to detect additional activity with the SIEM. Analyzing the malware for a call-back to an Internet host is extremely valuable information, and can be leveraged to not only detect malicious activity, but also to help shut it down. If the SIEM’s reporting doesn’t include an efficient way to run reports against information you’ve discovered, or been passed, then you are at risk of persistent trouble. The same detail goes for host-based logging. If you cannot import, normalize, and query against host-based event sources like HIPS or antivirus logs, you will not have a deep-reaching view into the network, and in many cases the most effective malware and backdoors rarely make outbound connections. You should be able to rely on host alarms to detect suspicious activity.
Some SIEM tools offer updates in the form of subscriptions. The deal is that you pay regular licensing fees to the company and they provide you with the ability to download updates to their detection patterns. This of course is a great way to stay on top of current threats since you have the backing of a large commercial security research team, but it will be expensive. Working with the SIEM, you may be able to create your own reports and detection methods, but it will involve lots of trial and error, ensuring your patterns and expressions are correct, familiarizing yourself with its workflows, ensuring you’ve defined the report within the right scope of devices, and that the system can properly handle your new report burden and–most importantly–is flexible enough to query across the appropriate event sources.
With custom reporting you are essentially overlaying your detection logic across the security event sources consumed by your SIEM. If you want to find whether any /etc/shadow files are being improperly accessed on the network, the SIEM may not have a rule or report for it, but you should be able to configure one. However will you be able to glean this information from your existing log data, or will you have to go to the event source to set the trigger? If your IDS doesn’t have a rule that will trip when /etc/shadow is detected on the wire, then your SIEM can’t know about it. The SIEM is only as good as its underlying data, and although your detection logic may be perfect, the SIEM has to have the event data, or the incident will go undetected.
Part II of this post will go into detail on the almighty concept of correlation, and the log management system alternative to the SIEM.