A Brief History of Malware Obfuscation: Part 2 of 2
We parted ways last time with a discussion of polymorphism that left you tantalized and on the heels of a malware revolution…
From the Greek meta meaning about or self.
From the Greek morphe meaning shape or form.
In 1998, a virus was found in the wild that was able to conceal itself in a different way. Called the Win95/Regswap virus, it was notable because it didn’t use polymorphic decryptors to thwart detection as it evolved. It would actually switch CPU registers from generation to generation (but otherwise retain the same codebase). This would prevent conventional pattern matching from working, but the technique of wildcard pattern matching, which would soon be implemented, would later catch up and nab this guy. This technique of register swapping was a basic form of metamorphism, and it was going to set the stage for an epic battle in the growing malware arms race.
Metamorphism, which can be thought of as “body-polymorphism,” was a major leap forward. Quite simply, the malware is able to reprogram itself as it evolves across generations. This was a quantum leap in viral programing, as the code is effectively becoming pseudo-self-aware, able to parse and mutate its own body as it spreads.
According to Walenstein, Mathur, Chouchane, and Lakhotia, there are two parameters for grouping metamorphic malware, classified on how they communicate and how they transform themselves:
- Open-World: Capability to communicate with the world around (download plugins, etc). In 2008, the open-world Conficker worm appeared in the wild, and the Internet hasn’t been the same since. At the time of this writing it is estimated that seven million Windows-based PCs are under its control.
- Closed-World: No external communication capability
- Binary Transformer: During evolution, mutates the binary executable itself.
- Alternate Representation Transformer: During evolution, refers to a pseudo-code representation and mutates based on it. In 2000, the Win32.Apparition virus was the first virus to use such a technique and carried with it a copy of its source code and would infect files on a machine whenever it found a suitable compiler.
Some of the more well known and “industry standard” metamorphic transformations include:
- Register Swapping: As discussed with the Win95/Regswap virus above, while all x86 CPU registers were designed with specific instructions in mind and resultant optimizations, they can also be used interchangeably.
- Code Substitution: Switching instructions for equivalent variants that result in different binary code but accomplish the same task (xor / sub and test / or instructions can be easily interchanged).
- Branch Condition Reversing: Stateless reordering of branch conditionals.
- Garbage Insertion: Also mentioned above, nop and clc instructions are commonly inserted to change the appearance of code but not its function
- Subroutine Reordering: Moving the order of subroutines such that they are called in a random order, adding a layer of complexity equal to n!, where n denotes the number of routines reordered.
- Code Insertion: One of the most complex methods, the malware will actually weave itself into the binary code of its host. Discussed below.
Entry Point Obfuscation
Entry Point Obfuscation (EPO) is a technique used by malware authors to dissuade AV scanners from investigating the files that have been invaded. For a virus to activate and acquire control, it needs to place itself within the line of execution. Traditionally this was done by changing the entry point into the target executable to first point to the virus code, which will presumably, at some point, release control back to the host executable. EPO-enabled malware will patch the target executable somewhere in the middle of its execution train with jmp/call instructions and receive control that way. By doing this, EPO will fool the AV scanner that looks for a modified entry point as part of its heuristics engine.
Advanced Viral Alchemy
One of the most complex viruses to date, W95.Zmist, was released in late 2000 by Russian viral theorist, author and all around malware superstar Z0mbie. W95.Zmist was a highly metamorphic, EPO-enabled, code interleaving, junk inserting, (possibly) polymorphic decryptor having, all around amazing viral masterpiece (true story). What it did that was so groundbreaking was that its metamorphic Mistfall engine would actually decompile target executables into manageable objects, mutate using all of the above techniques, and insert (interleave) itself in between the objects and then reassemble the entire Frankenstein-like executable. The most amazing thing about it was that it worked very well in almost all cases.
Not to be outdone, in 2002 the Mental Driller let loose Simile. According to Peter Szor, 90% of its 14,000 lines of assembler was devoted to its extremely complex metamorphic engine, “Metamorphic Permutating High-Obfuscating Reassembler” (MetaPHOR). Simile was unique at the time as it was an alternate representation transformer, which enabled the virus to grow or shrink in size as it evolved. It was also a cross-platform infector, able to attack and infect Linux ELF executables. Simile was very worrisome for the AV crews because, while it had no harmful payload, it was very hard to reliably detect. If someone decided to write a destructive virus on top of the MetaPHOR engine, it would be a real problem.
When done properly, metamorphic malware leaves no matchable or predictable patterns from one generation to the next. This is to say that efficiently metamorphic malware can generate millions of functionally equivalent variants of itself without the Achilles heel of a single signature being generated to detect it. This means that AV scanners need to develop advanced heuristics and event-based detection methods to find effective metamorphic malware. Unfortunately, this is not an exact science, and at the time of this writing, is still a work in progress.
Packers are a throwback to days of yore when the Internet was still a research toy and computer storage space was at a premium. System RAM and disks were much smaller in the 80’s and early 90’s. To keep the size of binary executables to an absolute minimum, so-called packing tools were popularized that encrypted and compressed files. While packers still have legitimate uses today (bundling executables with component files and commercial software protection), this technique was adopted and extended by malware authors to add polymorphism, armoring, metamorphism, EPO, and a host of other techniques aimed at evading AV scanners.
Packers offer powerful benefits to malware authors. When creating a new strain of an existing malware, if the malware author modifies most of the code but leaves parts of it intact (or picks and chooses pieces from other existing malware), the resultant executable will share patterns with its relatives. This means that if any signature exists for any piece of the antecedent, an AV scanner can match this pattern. However, packing the file with a packer means that just a tiny change in the source (for example, changing a register name) will result in a radically different binary executable. This effect is akin to how a single letter change in a lengthy document will result in a completely different cryptographic hash. There are literally thousands of discrete packing tools out there that are used to compress, encrypt and armor malware. Two notable outliers are mentioned below.
In 2009, University of Michigan PhD student Jon Oberheide debuted Polypack, a web-based automated file packing service. Dubbed by pundits as a “Crimeware as a Service” (CaaS) site, Polypack was in reality a service that evaluated the effectiveness of AV scanners when detecting packed malware. While it did show the ease with which someone could launch such a CaaS site, theirs is pro-bono only and password protected from the masses. What makes Polypack notably notorious is that it offers (registered) users automated access to a multitude of packers and AV scanners. The submitted file is packed by each packer and then scanned by each of the AV engines and the results displayed. It offers users a quick way to determine an effective evasive packing solution. Malware authors can use this model for obvious obfuscatory gain.
The King Midas of packers, commercially available Themida, currently represents the pinnacle of packing technology. Themida’s target market is actually commercial software vendors looking to protect their wares from reverse engineering. But like any security tool, it can be used for hostile intents or good intents. In all of the extensive testing performed by Oberheide in his Polypack experiments, Themida consistently outperformed all of the competition and evaded most of the AV scanners. It offers expert-level deployment of all of the obfuscation techniques presented in this blog posting (and much more) in a simple and convenient GUI-based interface.
On Packer Detection and Identification
If white hats could come up with a way to not just reliably detect when a file is packed but also identify what it is packed with, it would make malware analysis and detection much easier. Unfortunately, this is a part of the arms race that the good guys are having hard time with. Detection can be done with a reasonable degree of certainty using Shannon Entropy-based file analysis (and others have proposed more complicated but reportedly more effective methods). Detection without identification, however, is not very useful since a file can’t be unpacked when its packer is unknown, and friends, identification is a much more complicated animal. Sure, there are tools to detect how a file has been packed (such as the ubiquitous PEiD and the elusive Sigbuster), but they rely on pattern matching packed executables from their signature databases of known packers. As we have seen, this type of science’s effectiveness is a function of how complete its signature database is. And as packers evolve and change, even slightly, so do their resultant packed file signatures. Under scrutinous analysis by many researchers in many projects, at the end of the day, a significant portion of packed malware remains unidentified by SigBuster and PEiD. According to Oberheide’s testing of 98,801 malware specimens, as many as 40% of the herd were packed but not identified. In my own (albeit more limited) testing of approximately 500 samples, I found this number of unidentified packers to be as high as 71%.
How Will AV Cope?
Typically, today’s malware doesn’t follow the old-school virus paradigm of “infect and (eventually) trigger.” Instead it is laser-focused on maintaining persistent covert access at all costs. As we’ve seen, malware has advanced significantly over the last two decades plus to become incredibly good at achieving this goal. Some modern malware packages are actually software suites capable of optionally downloading or including by default various plugin-style upgrades, such as distributed denial of service modules, password sniffing modules, or in many cases, tip of the spear methods to mutate and obfuscate. What this means is that signature generation is a losing battle.
So far, in 2010, some AV vendors claim to be seeing upwards of 55,000 new malware samples a day! To put this in perspective, in order to obtain blanket signature coverage, a vendor would have 1.6 seconds to analyze a new malware strain, generate a new signature and push it out to their entire customer base’s scanner databases. And this would need to happen 24 hours a day, 7 days a week.
As is already the trend, to combat modern malware threats, AV vendors will employ an array of techniques ranging from simple pattern matching culminating in advanced heuristic analysis. There will likely be less scanning of files looking for signatures (although this will still play an important role) and more event-driven algorithmic detection methods, such as those employed by HBGary’s REcon.