To Hide is to Thrive
Malware is just plain insidious. It can do very wicked things on a very large scale. Ostensibly, to do the dirt, malware must fly under the radar of the good guys’ defenses. When it comes to the art and science of detecting and concealing malware, for decades an escalating war of complexity has raged on betwixt the benevolent and the malevolent. This article aims to be a 98% assembly language free (mov al, 61h) examination of that arms race, with a specific focus on a brief history of malware obfuscation.
Obfuscation of malware serves the one ultimate purpose: Survival.
Early on, malware authors learned that for their dark little creations to spread and prosper, they must be kept hidden from the sentinels of light. The longer a piece of malware can stay undetected, the longer it has to spread and evolve. If malware didn’t take measures to conceal itself, it would be easy pickins for the front-line troops in the AV vendors’ armies, the pattern matchers. Additionally, as malware stays enshrouded, it eschews analysis by the experts, which further complicates efforts to scrutinize its internal yum-yumness (and subsequently come up with methods to detect and destroy).
Viral Legerdemain is Born…
The first piece of malware that attempted to conceal its existence was also one of the earliest worldwide infectors. The Brain virus, written by the Farooq Alvi brothers in 1986, would cover-up attempts to read disk sectors that it had infected, and instead display unmolested data. This redirection, known as “garden-pathing,” where the protagonist is led down a seemingly innocent trail to cover up malfeasance, is an early example of some of the more complex techniques employed by modern-day malware (seen often in packers).
The first piece of malware to use encryption to scramble its contents was the Cascade virus, which first starting showing up in late 1986. The program consisted of a stub encryption/decryption routine (hereby referred to as a decryptor) followed by the actual body of the encrypted viral code — a technique that would eventually be adopted by nearly every encrypted malware. Cascade used a simple symmetrical XOR cipher keyed off of the size of the file. XOR was a perfect choice at the time because, while it can be a relatively weak cipher (its effectiveness at scrambling data is fully dependent on how random the key is that is uses), it was perfect back then for two reasons:
- Antivirus at the time, exclusively based on simple pattern matching, had a hard time with encrypted viruses. Since the virus body was a random jumble of bytes (encrypted at infection time), the only fingerprint-able pattern was the XOR encryption/decryption routine that preceded the actual virus (called a decryptor). The problem here was that AV programs couldn’t distinguish between different strains of the same virus nor could they identify disparate viruses that shared the same cryptography routines. Furthermore, as the strings to detect malicious code shrank in size, the false positives would increase as innocent files matching a suspicious byte-string were flagged.
- Since the XOR operation is symmetrical and reversible, it afforded virus writers the simplicity and brevity of only having a single function to do both encryption and decryption. When every byte counts, this is a huge win.
As viral science progressed, so did the means to fight back. AV vendors started wising up and were able to match most decryptor patterns with a growing legion of decryptor signatures. In order to flourish, the malware authors developed new ways to further obscure their creations.
From the Greek oligo meaning abnormally few or small.
From the Greek morphe meaning shape or form.
To combat the weakness in static decryptors, malware authors upped the ante with the creation of oligomorphic malware, which could change the decryptor. From one generation to the next, oligomorphic malware would mutate the decryptor used to encrypt and decrypt the malware body. The first example of oligomorphism in malware was the bloated file infector virus called Whale, which was first detected in late 1990. It carried with it a few dozen decryptors and would randomly chose one to encrypt itself as it spread to a new file. While more complex and numerous, signatures could still be created to detect malware of this type. Other oligomorphic viruses would generate decryptors dynamically, making it much harder for the AV vendors to write comprehensive signatures to catch all variations. Historically, it has proven to be infeasible to catch every strain of malware as it evolved. Oligomorphic code is indeed a simple version of a polymorphic engine and was portentous of things to come…
From the Greek polys meaning many.
From the Greek morphe meaning shape or form.
While statically encrypting and oligomorphic malware were troublesome, they were reasonably containable in terms of how many generational variants the Good Guys had to deal with. In 1991, however, the game got more complex. Dr. Alan Solomon is credited with coining the term polymorphism, which is a method of radically changing how malware conceals itself, all the while remaining functionally equivalent. This took the malware arms race to the next level. As a polymorphic virus spread from file to file, it would drastically change how it encrypted itself. In a properly engineered polymorphic virus, there will be almost no consistency in decryptor bytes from generation to generation.
As such, there is no pattern to match, no signature to create, and no easy way to find these virulent tricksters. To combat polymorphism, AV vendors had to invent new methods of warfare, including algorithmic-based detection and operating system execution emulators (see below).
The first polymorphic malware was a virulent .COM infector strain of the Vienna virus written in 1990 by Mark Washburn called 1260 AKA V2PX (this would be the first in the Chameleon virus family). The virus was a research project of Washburn’s, who claimed he wrote the code to show the AV vendors that signatures alone would not be enough to stop the viral horde. I’m sure they really appreciated that. True to form, as V2PX evolved, its decryptor mutated endlessly. In order to accomplish this obfuscation, V2PX would randomly insert so called “junk” instructions into its decryptor. Instructions like clc, nop and unused register manipulations were all part of its sleight of hand subterfuge. These low-level assembler mnemonics would change the size and appearance of the code, but not its overall function. The end result was an effective decryptor mutation in every generation of the virus that eschewed any sort of pattern matching.
The Mutation Engine
The first ever polymorphic toolkit, The Mutation Engine (MtE), was released in 1992 by the infamous Dark Avenger (it would not be the only one however: DAME, TPE, and many others were released). MtE enabled neophyte virus programmers to link their code to an MtE-generated polymorphic object and extend a normal non-obfuscated virus into a highly polymorphic one. At the time, this was a real problem for the white hats. Back then, most AV vendors could not accurately detect MtE-laden malware with 100% confidence. As this technique took off, literally hundreds of similar toolkits would be introduced. A polymorphic viral frenzy commenced.
Emulation to the Rescue
To combat the threat of polymorphic malware, AV vendors started including emulation code in their scanners to sandbox untrusted programs. The altruistic hope here is that the scanner would be able to execute the suspect program in a walled-off environment where, if it were malicious software, it could do no harm to the file system. During execution, the scanner would check the program’s memory image against its signature database in addition to fledging heuristic analyses, which included flagging suspicious behavior such as attempts to modify other executables or writes to the hard disk boot sector.
The problem with emulation wasn’t just that its algorithmics were prone to false positives (this has improved greatly as it’s matured), it was also vulnerable to armoring (AKA anti-anti-virus), where the malware would take measures to prevent the emulator from unraveling its mysteries. Many techniques were employed, and a few notables are listed below:
- “Endless” Looping: To remain thrift, early scanners would only execute the first few instructions of each program when looking for suspicious behavior; to combat this, virus authors would add huge do-nothing loops in the beginning of their code to tie up scanners until they had to move on to the next file.
- FPU Usage: Also a time/space trade-off, second-order effect was that floating point operations were deemed too expensive at the time and emulators did not support them and would exit.
- Fringe Features: Any undocumented or non-standard processor features were usually unsupported, such as manual interrupt invoking or register manipulation.
As personal computers grew in power, scanners grew in complexity. Eventually, the AV vendors were able to deal with most of the pitfalls of emulation and were knocking out most polymorphic viruses, some before signatures were even developed. This forced the virus authors to press the arms race to an all new level… which we will cover in the second half of this post.