Over the past 2 years, we have been systematically collecting and analyzing malware-generated packet captures. During this time, we have observed a steady increase in the percentage of malware samples using TLS-based encryption to evade detection. In August 2015, 2.21% of the malware samples used TLS, increasing to 21.44% in May 2017. During that same time frame, 0.12% of the malware samples used TLS and made no unencrypted connections with HTTP, increasing to 4.45%.
Identifying threats contained within encrypted network traffic poses a unique set of challenges. It is important to monitor this traffic for threats and malware, but do so in a way that maintains the privacy of the user. Because pattern matching is less effective in the presence of TLS sessions, we needed to develop new methods that can accurately detect malware communication in this setting [1,2,3]. To this end, we used the flow’s individual packet lengths and inter-arrival times to understand the behavioral characteristics of the transmitted data, and we used the TLS metadata contained in the ClientHello to understand the TLS client that is transmitting the data. We combine both of these views in a supervised machine learning framework allowing us to detect both known and unknown threats in TLS communication.
As an overview, Figure 1 provides a simplified view of a TLS session. In TLS 1.2 , the majority of the interesting TLS handshake messages are unencrypted, and are displayed in red in Figure 1. All of the TLS-specific information that we use for classification comes from the ClientHello, which will also be accessible in TLS 1.3 .
Throughout the life of this project, we have maintained that the data is at the heart of our success. We have teamed with ThreatGrid and Cisco Infosec to acquire malicious packet captures and live enterprise data. These data feeds have helped to guide our analysis and develop the characteristics of a flow that are most informative. To provide some intuition about why the data features that we have analyzed are interesting, we first focus on a particular malware sample, bestafera, which is known for keylogging and data exfiltration
Behavioral Analysis through Packet Lengths and Times
Figure 2 shows the packet lengths and inter-arrival for two different TLS sessions: a Google search in Figure 2a and a bestafera-initiated connection in Figure 2b. The x-axis represents time, the upward lines represent the size of packets that are sent from the client/source to the server/destination, and the downward lines represent the size of packets that are sent from the server to the client. The red lines again represent unencrypted messages, and the black lines are the sizes of the encrypted application_data records.
The Google search follows a typical pattern: the client’s initial request is in a small outbound packet, followed by large response spanning many MTU-sized packets. The several packets going back-and-forth are due to Google attempting to auto-complete my search while I was still typing. Finally, Google thought it had a pretty good idea what I was typing, and sent an updated set of results. The server that bestafera communicated with began by sending a packet containing a self-signed certificate, which can be seen as the first downward, thin red line in Figure 2b. After the handshake, the client immediately begins exfiltrating data to the server. There was a pause, and then the server sent a regularly schedule command and control message. Packet lengths and inter-arrival times can’t provide deep insight about the contents of a session, but they do facilitate inferences about the behavioral aspects of a session.
Fingerprinting the Application with TLS Metadata
The TLS ClientHello message provides two particularly interesting pieces of information that can be used to distinguish different TLS libraries and applications. The client offers a server a list of suitable cipher suites ordered in the preference of the client. Each cipher suite defines a set of methods, such as the encryption algorithm and pseudorandom function, that will be needed to establish a connection and transmit data using TLS. The client can also advertise a set of TLS extensions that, among other things, can provide the server with parameters needed for the key exchange, for example ec_point_formats.
The cipher suite offer vectors can vary in both the number of unique cipher suites offered and the different subgroups offered. Similarly, the list of extensions varies based on the context of the connection. Because most applications typically have different priorities, these lists can and do contain a great deal of discriminatory information in practice. As an example, desktop browsers tend to favor heavier weight, more secure encryption algorithms, mobile applications favor more efficient encryption algorithms, and the default cipher suite offer vector of clients bundled with TLS libraries typically offer a wider range of cipher suites to help with testing server configurations.
Most user-level applications, and by extension a large number of TLS connections seen in the wild, use popular TLS libraries such as BoringSSL, NSS, or OpenSSL. These applications usually have unique TLS fingerprints because the developer will modify the defaults of the library to optimize their application. To be more explicit, the TLS fingerprint for s_client from OpenSSL 1.0.1r will most likely be different than an application that uses OpenSSL 1.0.1r to communicate. This is also why bestafera’s TLS fingerprint is both interesting and unique: it uses the default settings of OpenSSL 1.0.1r to create its TLS connections.
Applying Machine Learning
For this blog post, we have focused on straightforward feature representations of three data types: traditional NetFlow, packet lengths, and information taken from the TLS ClientHello. These data types are all extracted from a single TLS session, but we have also developed models that incorporate features from multiple flows . All features were normalized to have zero mean and unit variance before training.
Legacy. We utilized 5 features that are present in traditional NetFlow: the duration of the flow, the number of packets sent from the client, the number of packets sent from the server, the number of bytes sent from the client, and the number of bytes sent from the server.
Sequence of Packet Lengths (SPL). We create a length-20 feature vector, where each entry is the corresponding packet size in the bidirectional flow. Packet sizes from the client to the server are positive, and packet sizes from the server to the client are negative.
TLS Metadata (TLS). We analyze both the offered cipher suite list and the list of advertised extensions contained in the ClientHello message. In our datasets, we observed 176 unique cipher suites and 21 unique extensions, which resulted in a length-197 binary feature vector. The appropriate feature is set to 1 if that cipher suite or extension appeared in the ClientHello message.
All of the presented results use the scikit-learn random forest implementation . Based on previous longitudinal studies that we conducted, the number of trees in the ensemble was set to 125, and the number of features considered at each split of the tree was set to the square root of the total number of features. The feature set used by the random forest model was composed of some subset of the Legacy, SPL, and/or TLS features depending on the experiment.
We sampled 1,621,910 TLS flows from one enterprise network, Site1, and 324,771 flows from ThreatGrid (collected between August 2015 and December 2016) to train our random forest model. We then simulated deploying the model on unseen data from a separate enterprise network, Site2, and malware data collected during the two months following the previous data set. There were 2,638,559 sampled TLS flows from Site2 and 57,822 TLS flows from ThreatGrid during January and February of 2017. Table 1 presents the results of this experiment at different thresholds. 0.5 is the default threshold of the classifier, and the higher the threshold, the more certain the trained model has to be to determine that the TLS flow was generated by malware. The malware/benign accuracies are kept separate to demonstrate feature subsets that overfit to a particular class. For example, Legacy can achieve near perfect accuracy on the benign set, but these features fail to generalize to the malware dataset.
At a threshold of 0.99, the classifier using the Legacy/SPL features correctly classified 98.95% of the benign samples, and 69.81% of the malicious samples. These results are significantly improved upon if we combine information about the application (TLS) with the behavioral characteristic of the network traffic (SPL). The combination of Legacy/SPL/TLS was the best performing model on the benign and malware samples. At a threshold of 0.95, this model achieved accuracies of 99.99% and 85.80% for the benign and malicious hold out datasets, respectively.
Decryption solutions are not ideal in all settings due to either privacy concerns, legal obligations, expense, or non-cooperating end-points. Cisco has devoted time (mine especially) to developing research and products to fill these gaps and compliment current solutions. Our validation studies on real network data have shown that we can achieve reliable detection with minimal false positives. In addition to engaging Cisco product teams to further develop this work, we have spent time engaging a broader external audience through open source  and academic papers [1,2,3].
 B. Anderson and D. McGrew. Identifying Encrypted Malware traffic with Contextual Flow Data. In
Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, AISec ’16, pages 35-46,
 B. Anderson and D. McGrew. Machine Learning for Encrypted Malware Traffic Classification: Accounting for Noisy Labels and Non-Stationarity. In ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (KDD), 2017 (To Appear).
 B. Anderson, S. Paul, and D. McGrew. Deciphering Malware’s use of TLS (without Decryption). ArXiv
e-prints, July 2016.
 T. Dierks and E. Rescorla. The Transport Layer Security (TLS) Protocol Version 1.2. RFC 5246
(Proposed Standard), 2008.
 D. McGrew, B. Anderson, B. Hudson, and P. Perricone. Joy. https://github.com/cisco/joy, 2017.
 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
- Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-
esnay. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825-2830,
 E. Rescorla. The Transport Layer Security (TLS) Protocol Version 1.3 (draft 20). https://tools.ietf.org/html/draft-ietf-tls-tls13-20, 2017.