Cisco Logo


Security

Recently I was working on reverse engineering a 16-bit MS-DOS binary to better understand a network transport protocol used for modem communication in some software I was looking at. I was using the IDA Pro tool for this purpose.

However, to my dismay, after looking at the string table and finding a string that seemed relevant to the particular section of code which I was interested in, I noticed that none of the strings in the string table contained cross reference information, and I was therefore unable to easily jump to the instructions in which it was used.

Upon further analysis, I determined that the reason the cross reference information for the strings in the table was not populated is because the strings resided in the data segment and referenced using the ds segment register.

However, when the string is accessed, it is typically passed to a function call using the “push” instruction to place it on the stack prior to the function call. When IDA sees the push instruction responsible for this, it can only tell that the instruction is pushing a 2-byte (16 bit) immediate value to the stack, such as 0xabcd, and has no way to tell that this is really a ds:string reference.

In order to continue analyzing the binary I needed to remedy this situation. I figured the easiest way to accomplish this is via an IDA python script which changed the type of the first operand to a ds offset. To test this manually, I used the o key, while selecting the operand and watched the x-refs get created.

To automate this for all the involved push instructions, however,  I needed a way to differentiate a push of a literal immediate value compared to the address of one of my strings. The way that I accomplished this was to create a python dictionary containing all the strings in the table. By using the address of the string as the dictionary key, I could quickly look up to see if a value being pushed existed in the table. From IDA Python, the easiest way I know to do this was using the Strings() method of the idautils class. The only problem with this is that the effective addresses of the strings returned by this method are the full address with the segment selector value applied. This means that if I compared them to the operand from each push instruction they would not actually match, since the push instructions simply hold the 2-byte offset to the selector. To convert each string address to this form i needed to subtract the value of the segment selector, shifted 4 bytes to the left. I retrieved the segment selector using the SegByName() function from the idc module.

The code to populate the string table dictionary ended up looking as follows:

from idc import *
from idautils import *

strtable = {}
csseg = SegByName("seg000") << 4 # get cs addr
dsseg = SegByName("dseg") << 4 # get ds addr

def fill_str_table():
        print "[+] Populating strings table"
        # populate our strings table dictionary
        s = Strings()

        for i in s:
                # easy referencing using ea later
                strtable[i.ea - dsseg] = str(i)
        # end for
# end fill_str_table

Now that I had a list of all the string addresses, I needed to begin enumerating the push instructions in the text segment to begin the process of repairing the x-refs. The first step to this was to locate the base addresses for each segment, the idc module has a handy Segments() method for this. For each of the segments returned by the segment method, I then needed to walk the list of elements defined within the segment using the Heads() method. Finally I could test if the element was an instruction before processing it I used the isCode() method for. The following code implements this:

        # For each of the segments
        for seg_ea in Segments():
                # For each of the defined elements
                for head in Heads(seg_ea, SegEnd(seg_ea)):
                        # If it's an instruction
                        if isCode(GetFlags(head)):

The next step was to test if the instruction was a “push,” since these are the main instruction which I cared to investigate. The GetMnem() method returns the text representation of the instruction mnemonic. I simply had to compare this with the string “push” to make sure I had the correct instruction.

          mnem = GetMnem(head)
          if(mnem == "push"):

Now that I’d found all the push instructions, I needed to find specifically the pushes which had a single operand which was an immediate value. I used the GetOpType() method to test the first operand type and compare it against the constant o_imm to accomplish this. After I had isolated the particular type of instruction I was interested in, i simply had to extract the value of the first op code (using GetOpnd()), and convert it to the same format used as the key in my string table. I accomplished this using the following code:

               op1 = GetOpnd(head,0)
               csea = head - csseg
               intop1 = 0
               try:
                       intop1 = int(op1[0:-1],16)
               except:
                       # not a hex opcode
                       continue

Now that I had extracted the operand value for each of the pushes, and converted it to the format needed, I simply had to test the dictionary I created earlier to see if there was a match which was trivial using the statement

 strtable.has_key(intop1)

After a match was confirmed, I had to convert the push operand into the data segment offset format. The method OpOff() allowed me to easily do this. Finally, for readability purposes, I also updated the comment in the IDB for this instruction to contain the string. This way I could easily see the purpose of the instruction when performing further analysis.

I have included the full code listing below for easy usage.

from idc import *
from idautils import *

strtable = {}
csseg = SegByName("seg000") << 4 # get cs addr
dsseg = SegByName("dseg") << 4 # get ds addr

def fill_str_table():
	print "[+] Populating strings table"
	# populate our strings table dictionary
	s = Strings()

	for i in s:
		# easy referencing using ea later
		strtable[i.ea - dsseg] = str(i)
	# end for
# end fill_str_table

def find_pushes():
	print "[+] Finding the pushes"
	# find all the pushes.

	# For each of the segments
	for seg_ea in Segments():
		# For each of the defined elements
		for head in Heads(seg_ea, SegEnd(seg_ea)):
			# If it's an instruction
			if isCode(GetFlags(head)):
				mnem = GetMnem(head)
				if(mnem == "push"):
					#print "found push @ 0x%x" % head
					if(GetOpType(head,0) == o_imm):
						op1 = GetOpnd(head,0)
						csea = head - csseg
						intop1 = 0
						try:
							intop1 = int(op1[0:-1],16)
						except:
							# not a hex opcode
							continue
						if(strtable.has_key(intop1)):
							print "[+] got: push ds:%s @ cs:0x%x" % (op1,csea)
							#print "str: %s" % strtable[op1]
							# change to ds offset
							OpOff(csea + csseg, 0, dsseg)
							# add comment
							MakeComm(csea + csseg, strtable[intop1])
							#return
						# end if
					# end if
				# end if
			# end if
		# end for
	# end for
# end find_pushes

print "[+] ds @ 0x%x" % dsseg

fill_str_table()
#print "%xh" % strtable.keys()[0]
find_pushes()

Comments Are Closed

  1. Return to Countries/Regions
  2. Return to Home
  1. All Security
  2. Return to Home