Hiding in Plain Sight: Malware’s Use of TLS and Encryption
TLS (Transport Layer Security) is a cryptographic protocol that provides privacy for applications. TLS is usually implemented on top of common protocols such as HTTP for web browsing or SMTP for email. HTTPS is the usage of TLS over HTTP, which is the most popular way of securing communication between a web server and client and is supported by the bulk of major web servers.
As TLS has become more popular and easier to use, we have seen the adoption of this technology by malware to secure its own communication. It is fairly straightforward for malware to plug into existing TLS libraries, and in some cases include an entire implementation in its own source code. This ease of use is troubling because it allows malware to easily evade detection and blend into benign traffic patterns typically observed on a network. In short, malware authors know how to use encryption, and they use it in TLS and in custom applications across many different ports and protocols.
In this blog post, we highlight some of the trends we are seeing with respect to the volume of malware traffic taking advantage of TLS, and on which ports this traffic appears. We compare and contrast malware’s usage of TLS with that of benign network traffic. Finally, we conclude by giving next steps to detect malware even in the face of encryption.
This analysis was done using data collected from ThreatGRID, a malware analysis sandbox. These results are restricted to malware samples receiving a threat score of 100. The malware samples were allowed to run for 5 minutes. During that time, packet captures were collected. David McGrew and I recently open sourced Joy, which was used to analyze the packet captures and extract features of the TLS communication. To make comparisons with “benign” traffic, an enterprise DMZ was used and the traffic was assumed to not contain malicious flows.
Trends in Malware’s use of TLS
Figure 1 shows the percentage of observed malware flows that made use of the TLS protocol, broken down by month. We see a steady 10-12% of malicious communication making use of the TLS protocol, with a slight positive slope. While the majority of malicious traffic that we observed was still unencrypted HTTP over port 80, the amount of encrypted malicious traffic is too large to ignore.
98.25% of the malicious TLS traffic we observed was HTTPS over port 443. But, we did see an interesting diversity of malware’s usage of TLS in the tail of the distribution. Figure 2 shows the percentage of ports where malware used TLS once 443 was discarded. There are some standard ports, such as 993 for IMAP-over-SSL and 995 for POP3-over-SSL. We also observed some unexpected ports, such as TLS over port 53 (DNS) and port 500 (ISAKMP). These results show that a rule-based system, e.g. “port == 443 || port == 993”, is not sufficient to detect TLS traffic for further analysis. Manipulating port numbers is a very low-cost obfuscation strategy that we see being employed in the malware data.
How Malware Uses the TLS Protocol
In many cases, malware uses standard TLS implementations. But, our study showed that there was a substantial difference in the cryptographic parameters selected by malware communication. We typically see malware choosing weaker parameters. It could be the case that these parameters are selected because they are computationally efficient or they are selected because the malware sample has its own custom encryption and using TLS just for transport.
Cisco categorizes different ciphers into categories of Recommended/Legacy/Avoid. These categories are identified in Cisco’s Recommendations for Cryptographic Algorithms. Figure 3 shows the percentages of the categories that we observed in malicious traffic versus benign traffic obtained from an enterprise DMZ. Malware does tend to use weak ciphersuites ~20% more than the DMZ traffic. As an example, the “Avoid” ciphersuite “TLS_RSA_WITH_RC4_128_MD5” was the most used ciphersuite in that category.
Figure 4 shows the TLS extensions that we observed clients advertising. The benign traffic generally had much more variability in the TLS extensions that were supported. The notable exception was the “000d” or signature algorithm extension, which is an RFC MUST in most situations.
Finally, we looked at the client’s public key length. Figure 5 shows the results. First, it is important to note that the strength of these key lengths depends on the public key algorithm as well as the key length. Elliptic curve cryptography with a 520-bit key is more secure than Diffie-Hellman with a 768-bit key. What our data suggests is that benign traffic uses 520-bit ECC for most of the TLS sessions while malware mostly uses 2048-bit DH.
Malware’s usage of encryption is alarming, because encryption interferes with the efficacy of signature-based techniques. Fortunately, our six month study of malicious network communications gathered from ThreatGRID has shown that malware, in most cases, uses TLS in a way that is distinct from that of benign traffic. We can leverage these differences in an analytics solution to help us classify encrypted traffic with TLS-aware telemetry. In a preliminary study with hundreds-of-thousands of malicious and benign TLS flows, our machine learning classifiers were able to achieve a total accuracy of 99.7% and a 1-in-10,000 false discovery rate of 90.4%. These results were recently presented at FloCon 2016.
To learn more about the growing use of encryption and other security trends, download the 2016 Cisco Annual Security Report.