Making Boring Logs Interesting
In the last week alone, two investigations I have been involved with have come to a standstill due to the lack of attribution logging data. One investigation was halted due to the lack of user activity logging within an application, the other from a lack of network-based activity logs. Convincing the asset owners of the need for logging after-the-fact was easy. But ideally, this type of data would be collected before it’s needed for an investigation. Understanding what data is critical to log, engaging with the asset owners to ensure logs contain meaningful information, and preparing log data for consumption by a security monitoring organization are ultimately responsibilities of the security monitoring organization itself. Perhaps in a utopian world, asset owners will engage an InfoSec team proactively and say, “I have a new host/app. To where should I send my log data which contains attributable information for user behavior which will be useful to you for security monitoring?” In lieu of that idealism, what follows is a primer on logs as they relate to attribution in the context of security event monitoring.
Know Thy Network
As Jeff stated in this series’ intro piece, “In the log analysis world, context is truly king.” Security monitoring requires an understanding of the network being monitored and the assets on that network. Knowledge only of layer 3 data provides no information into what the network or asset is used for, to whom an escalation should be sent in the event of a security breach, or what is considered normal or anomalous behavior. In large organizations, even a well-staffed Computer Security Incident Response Team (CSIRT) cannot possibly understand the intricacies of all the applications, hosts, and networks, as well as the owners of those assets. That additional understanding – or context – helps determine what events to investigate or consider a false positive. Part of the context necessary to understand security events includes attribution, that is, associating an application, device, or network asset with an owner. While mitigation may be possible without attribution, remediation requires identifying asset owners to address the root cause of the event. Attribution, therefore, requires some method of identification.
Automatic Network Access Control (NAC) identity methods vary widely, typically controlling access to networks or information, via simple authentication, a captive portal, 2-factor token access, 802.1x, or similar technology. However, by what method can you identify an owner if you lack full or partial coverage of every environment with your NAC solution? What are some of the caveats when using manual attribution solutions? Without NAC, CSIRTs must rely on logging to acquire enough data to attribute assets generating security events.
So what’s in a log?
A log message fundamentally consists of a timestamp and an event. The syslog facility, as defined in RFC5424, includes two additional attributes: facility – which classifies the source of the message – and severity. No widely adopted standard exists for the event portion of a log message, and in practice these vary widely. Building a robust security monitoring platform requires understanding log metadata; the data about data. We’ll explore the two principal components – time and event.
Similar to forensic investigations, security event monitoring typically involves building accurate timelines of when events occurred leading up to, during, and after a security incident. Timeline events often comprise disparate log event sources pieced together for a holistic understanding of the incident. To have meaning as a whole, each individual event source must both sync their clocks and share a standardized timezone.
The Network Time Protocol (NTP) is an industry standard for syncing clocks between networked systems, an implementation of which is included on almost any asset worth monitoring. But since NTP does not transmit timezones, assets must be individually configured with an organization’s standardized timezone. Since most timezones are defined as an offset from Coordinated Universal Time (UTC), it makes sense to standardize on UTC to avoid confusion, especially for organizations whose systems span multiple timezones. But any consistent usage of an organization’s standard timezone will work. For situations where the CSIRT has no influence on configuring a timezone standard on a collected event source, the log message’s timestamp must be converted to CSIRT’s standard prior to entry in the security monitoring infrastructure. This assumes, however, that the CSIRT knows the timestamp of the data source being collected and that the data source timestamp as configured will not change.
To understand how inconsistent timezones may cause issues, consider the case where an investigator knows via web proxy logs that a host accessed a malicious website but not know if the host was infected. If the host’s IP address was from a DHCP lease pool, correlating the event with DHCP logs based on inaccurate timestamps may identify the wrong host owner, which would affect mitigation, correlation with other data sources, and remediation.
Any system administrator knows that log messages vary greatly between event sources. But messages often contain shared attributes common across event sources such as source/destination address/port, hostname, NetBIOS name, source type, and event source. Linking data sources together – or correlating them – requires unioning or joining these shared attributes. Combining shared attributes, in turn, relies on messages being parsed and attributes identified and classified.
Foresight should be given to parsing attributes of log messages as they enter the security monitoring infrastructure. Standardizing names of shared attributes allows cleaner and easier to understand correlation queries. Parsing messages into standardized attributes requires using regexes (aka Regular Expressions). Writing regexes requires enumerating and understanding the types of different messages generated from a data source. Parsing a timestamp in one data source may require a different pattern match than another data source, based on different time format representations.
Even when timestamps are standardized and log messages are parsed and stored, attribution issues can arise due to the nature of the data source. Two common scenarios that cause problems with attribution are the use of NAT and having overlapping IP addresses.
Network Address Translations (NAT), consisting of many-to-one or IP masquerading configurations, complicate identification by only maintaining a transient state table of the NAT translations. Once the connection is torn down, unless remote logging existed, the availability of attribution data to associate the translated IP with a unique IP is lost. Note that even when logging is available, full attribution still requires associating an IP with an owner.
NAT also affects application and host logs. When connections are made through a NAT to a real server, applications and hosts must be able to cite the true source of the connection, yet still route any response back through the NAT to maintain connection state. Web servers solve this dilemma with the X-Forwarded-For header. Knowing only that an event occurred from the translated address can hinder or entirely halt an incident investigation.
Overlapping address spaces are likely to affect larger organizations with a decentralized network operation model like universities or conglomerate corporations. Where this situation exists, uniquely identifying an asset requires building a list of multiple data attributes of that asset. The resulting tuple can then be used as a primary key when searching other attribution data sources. Consider an example at a University, where the School of Nursing and School of Engineering both use the same internal RFC1918 addresses, but are monitored by different sensors. A unique identification key is created by pairing the RFC1918 address with the sensor address or hostname.
Asset Inventory Tracking
If the CSIRT doesn’t know everything about what is being monitored, they must engage the actual asset owners for clarity, confirmation, or remediation. But how do you associate ownership with an IP (or a tuple containing an IP and other attributes)? If one is to truly know their network, a dynamically updated asset tracking system must exist that accounts for anything on the network. Besides being easily query-able, it must answer questions such as: How is the network segmented? What hosts are in each segment and who owns them? What applications exist on the hosts and who are the owners? What is the criticality of each asset?
In smaller organizations, there will ideally be a single source-of-truth for asset inventorying. In lieu of that, consideration must be given towards how the CSIRT will consume information from disparate data sources. Will data be consumed to some central location via plugins that interact with various asset inventory APIs? Will each asset system regularly dump it’s data to a common format like csv, to be consumed by the CSIRT? Will CSIRT members individually access the asset inventory systems directly, and if so, what determines in which inventory system a particular asset resides?
Tying the Room Together
Recall that the original problem involved a lack of attribution, and identifying an asset owner. Structuring stored log data in a standardized format, as described above, allows for easier consumption to more readily identify an escalation point. The principles of log management, as stated, can be extended to non-attribution data sources as well. In fact, without classical non-attribution security event data, there’s little to alert on and escalate anyway. Ensuring that a repeatable and well documented escalation path exists, and having both security event data to trigger alarms, and attribution data to identify the source or destination of alarms, is fundamental to security monitoring.