Extracting EXE Drop Malware

July 27, 2011 - 3 Comments

In the last few years there has been a major shift in the vulnerability landscape from a focus on attacking network-based server applications to attacking client applications using malicious file formats. Due to this shift there has been a variety of new techniques developed by attackers for more reliable control post-exploitation.

One of the techniques that is commonly used by attackers is the EXE drop. Basically this technique revolves around placing an executable file within the data format in which the vulnerability takes place. Post exploitation, the payload searches for the file descriptor that is associated with the data file, copies the EXE file from it to disk, and executes the EXE file in a new process. Some examples of data formats that are commonly used in an EXE drop exploit are Office documents, Shockwave Flash Files, and image files. The EXE drop technique is useful for several reasons; one reason is because it makes coding the payload easier. The executable can be crafted quickly and compiled for a specific target. Also, by copying an executable file to disk (persistent storage) it’s fairly easy to maintain residency by adding an entry to the autorun registry keys for example.

From a malware analyst’s perspective, in order to understand what the piece of malware utilizing the EXE drop technique does, the executable file must be extracted from the data file before it can be analyzed. There are many ways this can be accomplished, in this blog post we will look at a couple.

The first method we will look at is to statically scan the data file in order to find the executable within. Then parse the executable to determine the file size and extract it to disk. The advantage of doing this statically is that we do not need to execute and exploit the vulnerable application in order to extract the executable for further analysis. We can implement this functionality in a Python script.

To extract the executable file from the data file we first need to find its starting location. An easy way to find the starting location is to use the YARA library. The YARA library was designed to make it easier for Malware analysts to perform pattern matching on large files. YARA can be run as a standalone executable, as a Python library (which is useful for automation), and more recently as a Ruby port (yara-ruby).

YARA works by parsing a text-based rule and testing the file for the conditions present. This allows a malware analyst to specify multiple strings that are contained within a file, and test for the presence of any/all/or certain combinations of them. Here is an example of a sample YARA rule taken from the YARA website.

<pre>rule silent_banker : banker
        description = "This is just an example"
        thread_level = 3
        in_the_wild = true

        $a = {6A 40 68 00 30 00 00 6A 14 8D 91}
        $b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}

        $a or $b or $c

To create a YARA rule to match an executable file we must first understand what exactly it is that we need to match. This technique is most common with malware written for Microsoft Windows. Because of this, we will focus on extracting executable files of the Portable Executable (PE) binary format, the object format that is used on this platform. The PE file format dictates that the first part of data we find in the executable file is the MS-DOS stub (MZ image) beginning with the MZ header. This mandatory stub exists only for backwards compatibility reasons. The MZ header specifies the size of the MS-DOS stub, and where to find the PE header afterwards. When the Windows loader parses the file, it simply pulls the offset to the PE header from the MZ header and begins loading. However, if the same file was to be executed on an MS-DOS system, which doesn’t understand the PE file format, the MZ image would be loaded and executed instead. Microsoft‘s modern linker utilizes this functionality by inserting a standard MZ image, which does nothing but outputs the string “This program cannot be run in DOS mode” and exits. A large portion of malware authors use Microsoft’s linker to generate their file drop payloads, therefore we can use this string to find a large percentage of the cases.

To search for this string in our input files from within our Python script we can craft the following YARA rule:

<pre>srch = """
rule exe_drop
        $a  = "This program cannot be run in DOS mode"
        all of them

Using the YARA library from within Python is a very straightforward task. You simply import the YARA library into the Python namespace, specify a rule as a Python string, and pass it into the compile method of the YARA class as the source attribute. The code shown below accomplishes this with our newly created rule “srch”:

<pre>Import yara
rules = yara.compile(source=srch)</pre>

Now that we have created a compiled version of the rule, we need to run it against our data file. This is simply a case of invoking the match method as so:

matches = rules.match(argv[1])

This method returns a matches object. One of the attributes of this object is the strings table. The table consists of a multi-dimentional array with an entry for each of the matches in the file. Each item in the array contains the offset to the match, followed by the variable name from the rule that matches, and then the string content of the match itself. An example of this is:

[(78, '$a', 'This program cannot be run in DOS mode')]

By dereferencing this array we can get the first piece of information that we require. This is the offset in the data file to the string that is inside the MZ portion of the PE file. Since this is a specific string taken straight from Microsoft’s linker, and the MZ portion of the file has remained the same for years, we can take a static offset from this string to the start of the file. This value is 78. The following code will result in locating the start of the PE file in the data file.

globals()['MZSIZE'] = 78
offset = matches[0].strings[0][0] - MZSIZE # offset in file to start of MZ header

Now that we’ve located the beginning of the PE file within the data file we need to calculate its length in order to extract it. To do this we need to understand a little about the layout of a PE file. The diagram below explains the structure of a PE file.

In this diagram we can see the structural overview of a PE file. As you can see the file begins with the MZ header, as mentioned earlier. Following this is the actual instructions and data tables of the MZ MS-DOS file itself. Next the PE headers are found (COFF header, Optional header), these contain meta-data information related to the PE contents such as the architecture type, number of sections in the section table, and time stamp information. Following the PE headers we find the section table. This is a table of structures that is designed to describe the remainder of the file. Each entry in the table describes a portion of the file (section) and tells how to map it into memory. Since this table describes the entire remainder of the file, it also allows us to determine the total size of the file, and ultimately determine our file offset to extract the file.

To parse our PE file we can utilize the pefile Python library. This library provides a cross-platform way to dissect the PE file and nicely organize the information. Prior to invoking the pefile library we can create a mapping of our executable file. This will save us having to open the file multiple times and perform reads into a temporary buffer. To do this we can use the cross-platform mmap library.

To use the mmap library, we just open the file for read in binary mode (‘r+b’). Then we call the mmap method on the file handle number that is associated with our python file object. The second argument to the mmap function is used to indicate how large of a mapping needs to be created. The value 0 is used to indicate to mmap that the entire contents of the file should be mapped. The code below shows this.

import mmap

fp = open(argv[1],'r+b')
map = mmap.mmap(fp.fileno(),0)

Now that we have created the mapping we can use pefile to begin parsing. This is a simple process that consists of calling the PE method of the pefile class and passing in the portion of the map starting at the offset we determined earlier. Unfortunately we don’t know the end offset of the file so we have to leave it blank and read to the end of the data. The following code implements this.

# parse with pe
pe = pefile.PE(data=map[offset:])

Once the pefile library has finished parsing our embedded executable file, we can begin walking the section table to find the end of the file.
Each entry in the section table is of type IMAGE_SECTION_HEADER, this structure is shown below.

<pre>typedef struct _IMAGE_SECTION_HEADER {
  union {
    DWORD PhysicalAddress;
    DWORD VirtualSize;
  } Misc;
  DWORD VirtualAddress;
  DWORD SizeOfRawData;
  DWORD PointerToRawData;
  DWORD PointerToRelocations;
  DWORD PointerToLinenumbers;
  WORD  NumberOfRelocations;
  WORD  NumberOfLinenumbers;
  DWORD Characteristics;

The two attributes of this structure that we are concerned with are the PointerToRawData, which is the file offset of the start of the section, and SizeofRawData, which is the size on disk of the section. By adding these two values together we can find the maximum offset of the file. With this in mind, finding the file size is simply a case of walking through the table and keeping track of the highest file offset that we see. Once we’ve finished traversing the table, this should leave us with the size of the file. The function below implements this.

<pre>def get_filesize(pe):
    largest = 0
    for section in pe.sections:
        addr = section.PointerToRawData + section.SizeOfRawData
        if(addr &gt; largest):
            largest = addr
        # end if
    # end for
    return largest
# end get_filesize</pre>

Now that we have the starting offset of the file and the file size, we can extract the executable file from the data. Since we already have the file mapped, this is an easy case of specifying the offsets to the mapping to extract the data. First, however, we need to come up with a name for the file. The most useful method that I can see for this is to hash the contents so that we have a unique identifier. The easiest way to hash the contents is to use python’s built-in MD5 library. The code below imports the library and instantiates it. The code then calls the update method to add the executable file’s data to be hashed. Finally the hex digest method is called to generate an MD5 hash in ASCII readable format.

<pre>   Import md5
    # md5 the exe for storage purposes
    m = md5.new()
    exefilename = m.hexdigest() + ".exe"</pre>

The final step in our python script is to write the file to disk, obviously we can do this in a few trivial lines of python. All we need to do is open the file with the filename that we created and write from the mapping we created, into the file.

<pre>   ofp = open(exefilename,"wb+")

With this completed, all that’s left is to test the script. The output below shows a trial run of the script against some Microsoft Office EXE drop malware. As you can see, an executable file was extracted using its MD5 checksum as the filename.

usage: C:\Users\neil\Desktop\pyexedump\pyexedump.py

C:\Users\neil\Desktop\pyexedump&gt;pyexedump.py malware.doc
[+] Matching data file: malware.doc
[+] Found file drop EXE at offset: 0x5de
[+] Mapping PE file.
[+] Size of PE file: 0x1c1e00 bytes.
[+] Writing out exe file to: 9399f501214bec3808eb87a5c6780d30.exe.

C:\Users\neil\Desktop\pyexedump&gt;dir 9399f501214bec3808eb87a5c6780d30.exe
 Directory of C:\Users\neil\Desktop\pyexedump

06/26/2011  02:14 PM         1,842,688 9399f501214bec3808eb87a5c6780d30.exe
               1 File(s)      1,842,688 bytes
               0 Dir(s)  207,376,134,144 bytes free</pre>

As I mentioned at the start of the post, this tool will only work against malware linked with Microsoft’s linker. Malware written with Borland Delphi for example has a complete different MZ file. Also, attackers will sometimes deliberately attempt to evade detection and pack the executable, and then unpack it during shellcode execution. In the future I will attempt to add some ability to investigate shellcode with libemu and mitigate this before the tool begins the extraction process; however, for now, we will look at an easy way to extract the executable file in these cases.

Rather than approaching this problem statically, it can make sense to actually load the data file and trigger the exploit. By allowing exploitation to occur, the executable file is dropped to disk where it can be retrieved. Clearly this should be done in a virtual machine, so as not to infect your actual machine. To assist in the purpose of collecting the dropped executable file it can be useful to use a sandbox program such as Sandboxie. Sandboxie is similar to a UNIX chroot environment. You select a program to run inside the sandbox and it will be restricted to a certain set of resources, instead of having full access to the system. By default Sandboxie will use the path “C:\Sandbox\Administrator\DefaultBox\drive\C” and any file access that the application running inside the sandbox makes will instead be re-routed to this directory. This means that you can clearly see any files dropped by the exploit.

With either of the approaches described in this post the objective is the same, to capture the executable file for further analysis. Hopefully after reading this post, you understand the difference between the two methods described, and advantages and disadvantages of each. I have included a full source code listing of the application described in this post (pyexedump.py below).


# [ pyexedump.py ]
# By Neil Archibald

import sys
import yara
import pefile
import md5
import mmap

class exedump:
    __srch = """
    rule exe_drop
            $a  = "This program cannot be run in DOS mode"
            all of them

    MZSIZE = 78

    def __init__(self, search_file):
        self.__offset = None
        self.__pe = None
        self.__pe_size = None
        self.__map = None
        self.__rules = yara.compile(source=exedump.__srch)
        self.__search_file = search_file
        self.__matches = self.__rules.match(self.__search_file)
    # end __init__

    def __set_pe_size(self):
        largest = 0
        for section in self.__pe.sections:
            addr = section.PointerToRawData + section.SizeOfRawData
            if(addr &gt; largest):
                largest = addr
            # end if
        # end for
        self.__pe_size = largest

    def has_pe(self):
        return (self.__matches and len(self.__matches) != 0)

    def find_pe(self):
        if not self.has_pe():
            return None

        self.__offset = self.__matches[0].strings[0][0] - exedump.MZSIZE # offset in file to start of MZ header
        return self.__offset

    def parse_pe(self):
        if self.__offset == None and self.find_pe() == None:
            return None

        fp = open(self.__search_file,'r+b')
        self.__map = mmap.mmap(fp.fileno(),0)

        self.__pe = pefile.PE(data=self.__map[self.__offset:])
        self.__map = self.__map[self.__offset:self.__offset + self.__pe_size]   #truncate extra bits
        return self.__pe_size

    def write_pe(self, filename=None):
        if not self.__map:
            return None

        if not filename:
            filename = self.gen_filename()
        fp = open(filename, "wb+")

        return filename

    def gen_filename(self):
        m = md5.new()
        filename = m.hexdigest() + ".exe"
        return filename

    def get_filesize(self):
        if self.__pe_size == None:
            self.__pe_size = self.__set_pe_size()

        return self.__pe_size

# end exedump

def main(argv):

    if(len(argv) != 2):
        print "usage: %s \n" % argv[0]
    # end if

    ed = exedump(argv[1])
    if not ed.has_pe():
        print "[!] error: no embedded executable file detected"
    # end if

    print "[+] Searching for embedded EXE file in: %s" % argv[1]

    offset = ed.find_pe()
    print "[+] Found file embedded EXE at offset: 0x%x" % offset

    file_size = ed.parse_pe()
    print "[+] Size of PE file: 0x%x bytes." % file_size

    exefilename = ed.gen_filename()
    print "[+] Writing out exe file to: %s." % exefilename


# end main

if __name__ == "__main__":
# end if</pre>

In an effort to keep conversations fresh, Cisco Blogs closes comments after 60 days. Please visit the Cisco Blogs hub page for the latest content.


  1. Thanks for the post, it was really helpful.

  2. Quite Informative Article Neil.

  3. Excellent post Neild ArchyBald!! I will look for the malicious payload exe’s in my companies netowrk now!!!