Tracking Malicious Activity with Passive DNS Query Monitoring

October 17, 2012 - 16 Comments

Ask anyone in the information security field they will tell you:

Security is not fair. There is essentially an unlimited supply of attackers that can test your defenses with impunity until they eventually succeed.

As a member of the Cisco Computer Security Incident Response Team (CSIRT) I’ve seen this asymmetry up close, so I can tell you that good security is really hard. Besides the normal security practices like deploying firewalls, IDS sensors, antivirus (AV), and Web Security Appliances, CSIRT is increasingly looking to the network as a data source. We have been collecting NetFlow for years but we have always wanted additional context for the flow data. While it is true that the Internet is built on TCP/IP, Internet services—both good and bad—are found by name using the Domain Name System (DNS). For years infosec has been network-address-centric and the attackers have adapted. Today it is very common to see malware command and control (C&C) use domain generation algorithms (DGAs), Peer-to-Peer (P2P), or even fast-flux DNS to evade IP address-based detection and blocking. It has become absolutely clear that to keep up with the latest attacks and attackers you must have a view into the DNS activity on your network.

CSIRT has been struggling with limited DNS information for a while now, so I am pleased to say we finally have comprehensive visibility into the DNS activity on our network. Before I dive into how we tackled this problem I should back up and explain a bit more about DNS…

When a client wants to access a service by name, it must resolve that name into a usable address (an IP address). To do this, the client sends a request for the name to a recursive name server and that server will retrieve the information and send it back to the client. From a security perspective, there are two interesting aspects to this activity. The first is the names clients are requesting and the second are the Internet hosts that are providing the services for any given name. Put another way, you want to know who is looking up a service (DNS queries) and you also want to know who is providing a service (DNS answers). The DNS answers portion of the problem has been solved by ISC’s Passive DNS Replication Project (and the corresponding ISC DNS Database). ISC’s DNSDB is very good at answering questions like “What DNS names have pointed at this IP?” as well as “What IPs have provided services for this name?”

Historically, to get at the DNS-questions side of the problem required logging to be enabled on all of your organization’s recursive resolvers and searching through those logs. This is an imperfect solution for a number of reasons that include:

  • Most organizations have a wide variety of nameservers (BIND, Active Directory, etc) with varying logging abilities and formats
  • Clients (and malware) can send DNS requests to external services like Google’s Public DNS or OpenDNS
  • Clients generate a huge volume of DNS queries and it is difficult (or costly) to quickly search such a high volume of logs

To side-step these problems as well as have complete coverage of our DNS query activity we have gone with passively capturing all DNS activity on the wire at all of our major network choke-points. To the best of our knowledge there is no security-focused product specifically built for handling DNS questions so we had to build a complete solution in-house. In many cases, passively capturing DNS activity off the wire allows us to see the query both pre and post-recursor as well as give us visibility into DNS queries to external nameservers. For complete coverage we have deployed capture sensors globally, focusing on each major node on the network. The data is stored locally on each sensor in a compressed packet capture format.

To search the data, we leverage many sensors that store the data locally by mapping out the query to each sensor in parallel and then reducing the search results into a presentable format at the search head. This gives us a built-in parallelism that scales as we add more sensors. To further speed up searches we have built filter indexes (using Bloom filters) to allow us to skip searching a file if it doesn’t contain any information we are looking for. In all, our Passive DNS Query Database (PDNSQDB) is just a few thousand lines of Python code and most of the heavy lifting is done by off-the-shelf tools like ncaptool and libbind.

Armed with fast and easy access to all client’s DNS query activity, CSIRT investigators like myself have been able to track malicious activity like never before. Not only have we been able to find compromised clients based on known-bad domain names, we have been finding previously unknown malicious names by mining the data for interesting patterns. Our Passive DNS Query Database has already proven invaluable in several investigations and as we develop new tricks and techniques we fully expect to improve our ability to detect and track malicious activity using the DNS Query data.

One of the best things about giving security engineers a new data source is seeing all of the creative ways it is used. For example, one of our engineers has been identifying new C&C domains by looking at the intersection in DNS queries between two different hosts compromised with the same malware (as reported by AV logs). Other engineers have taken a graph-theoretic approach. By combining our PDNSQDB view into queries with ISC’s DNSDB view into answers it is possible to identify related malicious domains and IPs when given just a single domain or IP—essentially the transitive-closure of malicious activity.

Now when CSIRT responds to a new incident, one of the first steps we take is to query our PDNSQDB. The data is proving to be an invaluable supplement to other data sources like NetFlow and web logs. The amount of effort we put into developing the tool has already paled in comparison to the value we’ve pulled out of it. If your organization doesn’t have fast and comprehensive visibility into the DNS activity on the network you should think about getting it. Now that we’ve had a taste of what the data can do for us we can’t live without it.

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.


  1. Hi, Are you releasing your code in near future . Beside ISC db is there any community db exist to query with specially for detection of malware based on DNS history?

    I am highly looking forward to for one of my paper of Phd research . actually i was thinking to run a passive DNS on internet

    any suggestion ?

  2. I definitely agree that DNS visibility is a powerful tool for security teams!

    For those looking to do this on their own, you can use Bro to log DNS traffic (and many other protocols). Those Bro logs can be sliced and diced using ELSA, a nice web interface for hunting through logs. You can have both Bro and ELSA up and running in a few minutes using Security Onion.

    Doug Burks
    Security Onion

    • You’re right that a tool like Bro can parse DNS and a lot of other protocols. I haven’t used ELSA but I assume it’s similar to Splunk in terms of searching and indexing ability.

      We really like storing the packet data rather than parsed content because it allows us to go back later and see anything else such as TXIDs or other flags that didn’t necessarily get parsed to text. DNS packets + UDP header are also significantly more compact than a full text representation of the content. Packet data also compresses well. Also, we didn’t want to put a parser in the path from the network to the disk. Some of our capture locations see extremely high DNS packet rates and we’d rather be able to parse after capture.

      I’m sure there are things full-text-indexing gets you that our tool does not. For example, we do the filtering and searching for names based on suffix. So turns into {“com”, “”, “”} for the bloom filter. This gives us some search flexibility while maintaining a lot of speed. Searching for all names that match /^www\./ can be done with our tool but it is not fast.

  3. Bro-IDS is an open-source tool that includes detailed DNS query logging by default:

    Two other open-source projects that can be used for analyzing the logs are Brownian and ELSA:



  4. What library do you use for bloom filters?
    What library or functionalities do you use to distribute work to multiple machines? How do you track completion/progress?

    • We’re using pybloomfiltermmap for the bloom filter. I can’t remember what we set the false-positive rate to but it’s either 5% or 1%.

      Since each node only searches the data it has collected we don’t need to do anything fancy like what you’d get with MPI or some other distributed computing / message passing / cluster API. Instead we’ve just built a client/worker model (in Python) where the client issues the search criteria to each node and they each report results back. The client then combines the results.

      Progress is tracked by each worker reporting the total files that match for the timerange. Then each time they finish searching a file (either because the file was passed up due to the bloom filter check or actually searched it) they report completing that file.

      So if I issue the command:

      $ python search --qname --start 2012-11-01

      The search status bar looks like:

      Search: 100% |#############################| Time: 0:00:49 Files:  100404/100404

      Indicating that for the time period there are 100404 1-minute capture files across all nodes.

  5. Can you describe the capture sensors a little bit which you have deployed globally to (passively?) collect the DNS traffic? For example, did you build some appliance that you hook up to switch mirror ports? Or do you collection transaction logs from your DNS servers/resolvers?

    • The sensors are just Linux running on Cisco UCS servers. By passively collecting DNS traffic I meant we’re capturing the packets, not collecting transaction logs from our DNS servers. We use a the VACL feature of our routing/switching gear to selectively capture DNS traffic so that we don’t have to mirror/span all traffic to our collectors.

      With the traffic being sent to our Linux machines, we’re using NCAP (ncaptool) to record the traffic into 1-minute capture files. Our code then comes along after the minute is over and builds the bloom filters and rolls the files into our data store for fast searching.

  6. Could please pretty please release some code or more technical details concerning that tool?

    I think it would be a tremendous contribution to the infosec toolset for complex infrastructures.

    • I’d be asking the same question in your place. I’m a big proponent and user of open-source software and so are several of my colleagues. We’ll definitely be releasing more technical details and we certainly would like to release the code. Although the code is still undergoing development and improvement, we’ve coded it with the possibility of releasing it ind mind. Our biggest hurdle right now is going through the internal approval process.

      In short, we really want to but it isn’t something that will happen overnight.

  7. Actually pretty interesting article. Enough so that I had to leave a comment. Keep this in mind though.. What if the target is using packet encapsulation to fully encrypt the packet data gram post key exchange ?

    • You’re right that any sort of obfuscation or encapsulation would bypass our packet capture and searching. This would require the attackers to develop their own name-resolution infrastructure. Certainly not outside the realm of possibility but it does significantly raise the bar. Furthermore, attackers want their software to work on internal networks with egress filters or private IPs. If the only way to reach the Internet to perform name resolution is by making use of a recursive nameserver then the only option is to speak DNS properly.

      Of course, the other option is to ship ship malware with IP addresses rather than names but that scales poorly and doesn’t provide the same resiliency to attackers that they get with name-based services.

  8. “To the best of our knowledge there is no security-focused product specifically built for handling DNS questions so we had to build a complete solution in-house”

    Damballa Failsafe is built for DNS monitoring malware detection based on algorithms to identify malicious domains, DGAs etc and also connection and binary/PDF analysis although I am not sure how effective this is realworld.

    • Thanks for the links. It looks like they’ve built a pretty sophisticated system for real-time analysis and detection. It isn’t clear if they provide a way to search all activity or just activity they flagged as suspicious. Certainly worth looking at though.

  9. Thanks Brandon for this nice entry about your use case which is very encouraging.
    Do you have any plans to make Python code available eventually?

    • There is interest within my group to make the code available however there are some complexities to releasing internally developed code that none of us are familiar with. We’re still in the exploritory phase.