Defeating Polymorphic Malware with Cognitive Intelligence. Part 2: Command Line Argument Clustering

Co-authored with: Jan Jusko, Harry Nayyar, and Danila Khikhlukha.

Adversaries continue to evolve their techniques to evade detection. Static analysis approaches are prone to evasion using malicious packers, code obfuscation, and polymorphism. That means that the vast majority of malware is unique to each target, and that poses an on-going challenge for traditional endpoint security solutions. At the same time, dynamic analysis generally performed in a sandbox environment has its own challenges around sandbox detection and evasion techniques. Malware authors like to play dirty tricks in defenders’ sandboxes.

Given that fact it would be optimistic to expect organizations to be able to catch up on their own, it is not only about the rapid pace of change that is challenging individual defender groups, but it’s also the sophistication and the scale of these attacks.

Overcoming The Challenges

Mindfully listening and understanding the most critical needs of our customers is vital for us. Helping solve those challenges is the core of our work at Cisco. We strive to do so by continuously improving existing, as well as exploring new ways to make security teams more effective at what they do – protecting their organizations. With over 12 years of research experience, more than 80 machine learning scientists and engineers, and 60 patents and fillings, Cisco Cognitive Intelligence group along with the AMP research team are committed to helping customers achieve shared security goals.

In our previous blog post, Cognitive Intelligence: Empowering Security Analysts, Defeating Polymorphic Malware (Part 1), we showed how Cognitive Intelligence helps detect and prioritize breaches while providing context-rich (and organization-tailored) threat knowledge to incident response teams and how that knowledge helps focus on alerts that really matter. We have also looked at one of the recently implemented algorithms (Probabilistic Threat Propagation) that helps scale up the number of retrospectively convicted polymorphic malware samples through knowledge sharing between multiple threat intelligence sources available in the Cisco Security portfolio.

In this blog post, we present some of the steps that Cisco takes to improve detection efficacy even further by taking the same data and looking at it from a different angle. Cisco AMP for Endpoints is now able to convict polymorphic and evasive malware variants based on the command line arguments observed during sample execution. This capability also facilitates the automated creation of Cloud IOCs, increasing the threat landscape coverage and providing actionable alerts with greater level of detail and context. This blog explores how we’re building a reliable training set in a big data environment, how we use it for data clustering, and how we created an automated process for generation and vetting of newly-created Cloud IOCs.

Cloud IOCs

In AMP for Endpoints, Cloud IOCs are one of the most effective post-infection detection capabilities that helps security teams surface malicious or suspicious behaviors observed on an endpoint. Quite often this represents a combination of individual events that together likely have malicious intent.

Registry keys were modified to maintain persistence
Microsoft Word launched PowerShell using VBA macro
A suspicious scheduled task was created using schtasks command
WMI command tool was used to execute a command on a remote computer
PowerShell attempted to download content into a string or download a file and execute it

Examples above represent just a tiny fraction of behaviours that can be attributed by Cloud IOCs. The goal here is to help analysts come up with the right response action by combining the knowledge of their environment with the details provided in the alert.

Example Cloud IOC: W32.PowershellDownloadedExecutable.ioc

Cisco research teams drive all of these indicators. And while we always prioritize quality, in this case, quantity is also important, as we observe adversaries utilizing new techniques daily. There are a couple of ways to increase the quantity of the behaviors being monitored. Conduct more research to create new Cloud IOCs manually. Or create new methods that would allow for automated generation of IOCs at scale to provide more comprehensive coverage. With automation like that security teams can focus on actionable alerts pin-pointed by the algorithm (described further). Since Cloud IOCs trigger on malicious behavior, it means that they represent threats that were not prevented by other layers of security.

What’s New in 2018: Command Line Argument Clustering

Let’s go over one example algorithm that is designed to uncover previously unknown or evasive malware, convict polymorphic instances of known threats, and then turn these results into prevention capability. Given the architectural framework in which all of the AMP-enabled devices operate, this intelligence becomes immediately available for enforcement across the entire security architecture.

Command Line Argument Clustering is an algorithm that enables automated generation of Cloud IOCs at scale. It, therefore, improves the detection of malicious binaries based on their behavior as observed on the endpoint systems with an engine to monitor process executions (AMP for Endpoints).

The specific focus of the algorithm is on command line arguments used to execute binaries. The choice of command line arguments for the Cloud IOCs is based on the fact that these arguments are often associated with various malware families. The idea for this algorithm came up during discussions about shortcomings of the more traditional static and dynamic analysis approaches. Command Line Clustering Algorithm complements them and provides a level of security above them. The algorithm does not have to rely on an assessment of sample properties to evaluate maliciousness. At the same time, it does not require samples to run through dynamic analysis.

The IOCs resulting from command line clustering can represent known or unknown malware families. As such, some clusters may be considered classified when a known malware family is associated with a given cluster after research. On the other hand, when no single known malware family can be associated with a cluster, or where such attribution would require further research, an equally actionable, but less descriptive unclassified IOC can be created. Below are two examples of automatically generated Cloud IOCs: classified and unclassified.

Classified Auto-generated Cloud IOC: W32.Dealply2.ioc. In the classified event, there is a mention of specific malware family (DealPly Adware in the example above) and its known methods.

Unclassified Auto-generated Cloud IOC: W32.Generic.1682.cam.ioc. In the unclassified (generic) event, we see a behavioral profile that demonstrates malicious action, without attribution to a particular malware family.

There are hundreds of thousands new malicious binaries that are generated by attackers daily. These malware samples belong to a variety of malware families. It is therefore infeasible to analyze any significant portion of them manually and create detection content based on the results of such analysis. In this research, we analyze command line arguments that are passed to a binary upon execution. The goal is to cluster binaries utilizing similar sets of command line arguments together and automatically build Cloud IOCs that detect these common families with high precision. The key to success here is availability of large amounts of balanced telemetry to be processed by the algorithm (and that is where we benefit from the large install base of AMP for Endpoints).

The algorithm progressively goes through the following steps in an automated fashion:

Data Collection, Transformation, and Clustering – collecting telemetry from executions of malicious, legitimate, and unknown binaries. Parsing command line arguments of all executed binaries to further process only named arguments. Constructing and clustering a graph that captures pairwise similarities of captured executions.
Cloud IOC Selection – creating candidate Cloud IOCs from clusters of executions produced in a previous step (for each cluster, command line arguments that provide the best coverage and precision are selected as an IOC) and then filtering out clusters containing legitimate binaries. Prioritizing clusters using various criteria (such as clusters containing execution of high-risk malware or clusters containing high number of unique binary files). And finally converting clusters into Cloud IOCs by selecting command line arguments that are typical for a particular cluster.
Cloud IOC Vetting – running newly created IOCs on the in-field telemetry and monitoring their performance to later deploy those that ensure high accuracy for in-field malware detection. If all command line arguments contained in an IOC are present during sample execution (or if certain arguments are absent as opposed to that), a Cloud IOC triggers a detection notifying a security analyst about a potential compromise

The Command Line Argument Clustering algorithm proves to be effective as different polymorphic malware families often use the same set of command line arguments. These are the arguments that shouldn’t be observed next to the known benign files. Modern malware is complex and can consist of multiple components and often, a dropper component is responsible for execution of a malicious binary. This dropper passes specific and unique command line arguments to the malicious binary. For example, a dropper that executes a ransomware binary can pass in the amount of ransom to demand as a command line argument. While the argument values can vary on a per-target or per-campaign basis, the argument name often remains the same and can be used as an effective indicator of compromise.

The elegance of Command Line Clustering is that it further increases coverage against today’s advanced threats. Though it is fully automated, it can also be guided by Cisco’s research team to ensure the highest level of efficacy. It also does not present any computational costs to customers running AMP for Endpoints in their environments, as the processing is done in the Cisco Cloud infrastructure. The binaries that trigger generated Cloud IOCs are retrospectively marked malicious in the AMP Cloud further decreasing the average Time To Detect value. That also turns this capability into prevention that benefits all customers leveraging AMP in their environments (AMP for Email Security, WSA, NGFW, NGIPS, Umbrella and of course Endpoints).

If you are using AMP for Endpoints in your environment today, make sure to enable Command Line Capture capability to immediately benefit from the efficacy improvements.

Conclusion

Command Line Argument Clustering algorithm is an innovative weapon that works for security analysts to help uncover evasive malware and morphing threats in their environments. What previously required dedicated team members to put in hours of threat hunting or manual analysis work, is now naturally automated. In the world where attackers continuously come up with new techniques to bypass means of traditional malware prevention, security teams should be empowered to move the security needle further. And in the end, it’s important to remember that nobody wins alone. For long-lasting results, and more consistent and predictable outcomes, it’s always best to act as a team that shares the same vision and goals. Check in for updates on other research work that Cisco is doing to help customers in their day-to-day battles in Part 3 of this blog series.

Additional Resources

AMP for Endpoints: http://cisco.com/go/ampendpoint

AMP for Endpoints Protection Lattice

Learn how to operationalize Cisco’s Advanced Threat Security portfolio: Behind The Perimeter – Fighting Advanced Adversaries