Tracking Malicious Activity with Passive DNS Query Monitoring
Ask anyone in the information security field they will tell you:
Security is not fair. There is essentially an unlimited supply of attackers that can test your defenses with impunity until they eventually succeed.
As a member of the Cisco Computer Security Incident Response Team (CSIRT) I’ve seen this asymmetry up close, so I can tell you that good security is really hard. Besides the normal security practices like deploying firewalls, IDS sensors, antivirus (AV), and Web Security Appliances, CSIRT is increasingly looking to the network as a data source. We have been collecting NetFlow for years but we have always wanted additional context for the flow data. While it is true that the Internet is built on TCP/IP, Internet services—both good and bad—are found by name using the Domain Name System (DNS). For years infosec has been network-address-centric and the attackers have adapted. Today it is very common to see malware command and control (C&C) use domain generation algorithms (DGAs), Peer-to-Peer (P2P), or even fast-flux DNS to evade IP address-based detection and blocking. It has become absolutely clear that to keep up with the latest attacks and attackers you must have a view into the DNS activity on your network.
CSIRT has been struggling with limited DNS information for a while now, so I am pleased to say we finally have comprehensive visibility into the DNS activity on our network. Before I dive into how we tackled this problem I should back up and explain a bit more about DNS…
When a client wants to access a service by name, it must resolve that name into a usable address (an IP address). To do this, the client sends a request for the name to a recursive name server and that server will retrieve the information and send it back to the client. From a security perspective, there are two interesting aspects to this activity. The first is the names clients are requesting and the second are the Internet hosts that are providing the services for any given name. Put another way, you want to know who is looking up a service (DNS queries) and you also want to know who is providing a service (DNS answers). The DNS answers portion of the problem has been solved by ISC’s Passive DNS Replication Project (and the corresponding ISC DNS Database). ISC’s DNSDB is very good at answering questions like “What DNS names have pointed at this IP?” as well as “What IPs have provided services for this name?”
Historically, to get at the DNS-questions side of the problem required logging to be enabled on all of your organization’s recursive resolvers and searching through those logs. This is an imperfect solution for a number of reasons that include:
- Most organizations have a wide variety of nameservers (BIND, Active Directory, etc) with varying logging abilities and formats
- Clients (and malware) can send DNS requests to external services like Google’s Public DNS or OpenDNS
- Clients generate a huge volume of DNS queries and it is difficult (or costly) to quickly search such a high volume of logs
To side-step these problems as well as have complete coverage of our DNS query activity we have gone with passively capturing all DNS activity on the wire at all of our major network choke-points. To the best of our knowledge there is no security-focused product specifically built for handling DNS questions so we had to build a complete solution in-house. In many cases, passively capturing DNS activity off the wire allows us to see the query both pre and post-recursor as well as give us visibility into DNS queries to external nameservers. For complete coverage we have deployed capture sensors globally, focusing on each major node on the network. The data is stored locally on each sensor in a compressed packet capture format.
To search the data, we leverage many sensors that store the data locally by mapping out the query to each sensor in parallel and then reducing the search results into a presentable format at the search head. This gives us a built-in parallelism that scales as we add more sensors. To further speed up searches we have built filter indexes (using Bloom filters) to allow us to skip searching a file if it doesn’t contain any information we are looking for. In all, our Passive DNS Query Database (PDNSQDB) is just a few thousand lines of Python code and most of the heavy lifting is done by off-the-shelf tools like ncaptool and libbind.
Armed with fast and easy access to all client’s DNS query activity, CSIRT investigators like myself have been able to track malicious activity like never before. Not only have we been able to find compromised clients based on known-bad domain names, we have been finding previously unknown malicious names by mining the data for interesting patterns. Our Passive DNS Query Database has already proven invaluable in several investigations and as we develop new tricks and techniques we fully expect to improve our ability to detect and track malicious activity using the DNS Query data.
One of the best things about giving security engineers a new data source is seeing all of the creative ways it is used. For example, one of our engineers has been identifying new C&C domains by looking at the intersection in DNS queries between two different hosts compromised with the same malware (as reported by AV logs). Other engineers have taken a graph-theoretic approach. By combining our PDNSQDB view into queries with ISC’s DNSDB view into answers it is possible to identify related malicious domains and IPs when given just a single domain or IP—essentially the transitive-closure of malicious activity.
Now when CSIRT responds to a new incident, one of the first steps we take is to query our PDNSQDB. The data is proving to be an invaluable supplement to other data sources like NetFlow and web logs. The amount of effort we put into developing the tool has already paled in comparison to the value we’ve pulled out of it. If your organization doesn’t have fast and comprehensive visibility into the DNS activity on the network you should think about getting it. Now that we’ve had a taste of what the data can do for us we can’t live without it.