Friday, 1 April 2016

Solving kao's toy project with symbolic execution and angr

Kao's toy project is a nifty and small crackme and quite ideal for demonstrating the power of symbolic execution. Running the crackme provides us with an installation id. We need to enter an unlock code which shows the goodboy message.

The main window
Fig 1: The main window
The installation id is calculated from the hard disk serial number. We will not focus on the algorithm that generates the installation id but rather on developing the keygen which calculates an unlock code when given an install id.

Before discussing about the function which checks the validity of the entered unlock code, it is important to mention that the installation id is 32 bytes (8 DWORDS) long and is displayed on the crackme screen in the form
D1D0-D3D2-D5D4-D7D6

i.e. within each QWORD the two DWORDS are stored in little endian order. We need to take this into account in our keygen program and convert the entered installation id to the proper form.

Previous Work

This crackme has previously been solved by Rolf Rolles who used a similar technique mentioned here in this blog post. While the method involving SMT solver is similar, Rolf used semi-automatic techniques, which translated the assembly code to IR and finally generated the constraints from the IR.

Before Rolf Rolles, this was solved by andrewl & Dcoder who used cryptanalysis techniques to reduce the keyspace. More recently, this was solved by Cr4sh who used the openreil framework.

The heart of the crackme

At the heart of the crackme lies this small function which checks whether a given unlock code is valid or not.
Control flow graph of the checking function
Fig 2: The checking function


.text:004010EC ; int __stdcall check(int part1, int part2)
.text:004010EC check proc near ; CODE XREF: DialogFunc+104p
.text:004010EC
.text:004010EC output = byte ptr -21h
.text:004010EC part1 = dword ptr 8
.text:004010EC part2 = dword ptr 0Ch
.text:004010EC
.text:004010EC push ebp
.text:004010ED mov ebp, esp
.text:004010EF add esp, 0FFFFFFDCh
.text:004010F2 mov ecx, 32
.text:004010F7 mov esi, offset installId
.text:004010FC lea edi, [ebp+output]
.text:004010FF mov edx, [ebp+part1]
.text:00401102 mov ebx, [ebp+part2]
.text:00401105
.text:00401105 encode: ; CODE XREF: check+23j
.text:00401105 lodsb
.text:00401106 sub al, bl
.text:00401108 xor al, dl
.text:0040110A stosb
.text:0040110B rol edx, 1
.text:0040110D rol ebx, 1
.text:0040110F loop encode
.text:00401111 mov byte ptr [edi], 0
.text:00401114 push offset String2 ; "0how4zdy81jpe5xfu92kar6cgiq3lst7"
.text:00401119 lea eax, [ebp+output]
.text:0040111C push eax ; lpString1
.text:0040111D call lstrcmpA
.text:00401122 leave
.text:00401123 retn 8
.text:00401123 check endp
.text:00401123
.text:00401126
view raw check.asm hosted with ❤ by GitHub
The function takes two dwords (from the unlock code) as arguments which are then used to encode/encrypt the installation id (plaintext) to a given output buffer(ciphertext). For our entered unlock code to be valid, the encoded output must match the hardcoded string 0how4zdy81jpe5xfu92kar6cgiq3lst7.

Solving with Z3

At first we will try to model the system in Z3. Specifically, we will represent the encoding loop in Z3. Then we will use Z3 to solve the system and find the two dwords (unlock code) which encodes the installation id to the hardcoded string.

from z3 import *
import binascii
import sys
# Calculates the installation id from the entered string
# This function just reverses the order of dwords in each quadword
def getInstallIdFromString(iid_string):
qword1, qword2, qword3, qword4 = iid_string.split('-')
dword1 = list(binascii.unhexlify(qword1))[3::-1]
dword2 = list(binascii.unhexlify(qword1))[7:3:-1]
dword3 = list(binascii.unhexlify(qword2))[3::-1]
dword4 = list(binascii.unhexlify(qword2))[7:3:-1]
dword5 = list(binascii.unhexlify(qword3))[3::-1]
dword6 = list(binascii.unhexlify(qword3))[7:3:-1]
dword7 = list(binascii.unhexlify(qword4))[3::-1]
dword8 = list(binascii.unhexlify(qword4))[7:3:-1]
return map(ord, dword1 + dword2 + dword3 + dword4 + dword5 + dword6 + dword7 + dword8)
def main():
if len(sys.argv) < 2:
print 'Please provide the installation id as an argument'
return
# Sanity Check
assert len(sys.argv[1]) == 16*4+3
install_id = getInstallIdFromString(sys.argv[1])
# The install id must encode to this hardcoded string
target = map(ord, list('0how4zdy81jpe5xfu92kar6cgiq3lst7'))
s = Solver()
# The two parts of the unlock code
part1 = edx = BitVec('part1', 32)
part2 = ebx = BitVec('part2', 32)
for i in xrange(32):
# text:00401105 lodsb
byte = install_id[i]
# text:00401106 sub al, bl
byte -= Extract(7, 0, ebx)
# text:00401108 xor al, dl
byte ^= Extract(7, 0, edx)
# text:0040110B rol edx, 1
edx = RotateLeft(edx, 1)
# text:0040110D rol ebx, 1
ebx = RotateLeft(ebx, 1)
# Add constraint
s.add(byte == target[i])
# Solve the system
if s.check() == sat:
m = s.model()
print 'Unlock Code: ',
print '%08X-%08X' %(m[part1].as_long(), m[part1].as_long () ^ m[part2].as_long())
if __name__ == '__main__':
main()
view raw z3solver.py hosted with ❤ by GitHub
The script takes in the installation id as a command line argument. Lets' walk through the code step by step.
install_id = getInstallIdFromString(sys.argv[1])
Here we convert the install id into its proper form i.e the order of the two DWORDs within each QWORDs is reversed and returned as a list of integers.
target = map(ord, list('0how4zdy81jpe5xfu92kar6cgiq3lst7'))
After encoding the installation id it must match with the hardcoded string. Here we are converting the that string to a list of characters where each character is represented by its ASCII value.
part1 = edx = BitVec('part1', 32) 
part2 = ebx = BitVec('part2', 32)
We declare two bit-vectors of with a size of 32 bits each. These two bit vectors represents the two DWORDS of the unlock code.

for i in xrange(32):
    # text:00401105 lodsb
    byte = install_id[i]
        
    # text:00401106 sub al, bl
    byte -= Extract(7, 0, ebx)
    
    # text:00401108 xor al, dl
    byte ^= Extract(7, 0, edx)

    # text:0040110B rol edx, 1
    edx = RotateLeft(edx, 1)
        
    # text:0040110D rol ebx, 1
    ebx = RotateLeft(ebx, 1)
        
    # Add constraint
    s.add(byte == target[i])

The above loop describes the encoding process. Each character of the install_id is processed. This value must match the corresponding character in the target list. For this we use constraints.

# Solve the system
if s.check() == sat:
    m = s.model()
    print 'Unlock Code: ',
    print '%08X-%08X' %(m[part1].as_long(), m[part1].as_long () ^ m[part2].as_long())

Finally, we ask z3 to solve the system. and print the solutions.

Solving with angr

Okay we have already solved the crackme, so why another method? This is because I wanted to see if we can use angr for the same purpose, besides it would be a good learning experience.

Lets look at the cfg once again
Calling the check function
Fig 3: We want to execute the green basic block and avoid the red one

At 40122A function check is called. If our entered unlock code is correct check would return 1 and we would go to the green color basic block at 401234 which displays the good boy message.

Now to the cfg of the check function.
Control flow graph of the check function with hooks
Fig 4: Inserting hooks inside the check function

We are going to execute the above function symbolically. The unlock code which is comprised of two parts are passed as arguments to the function. Since we are executing this function in isolation we need provide the inputs ourselves, and this can be done by setting a hook at 4010FF (set_ebx_edx). Within the hook, we would store symbolic values representing the two parts of the unlock code into the ebx and edx registers.

Lastly, at 40111D there is a call to lstrcmpA. This function is imported from kernel32.dll.  Now, within our execution environment this dll is not loaded, we must emulate the behaviour of lstrcmpA and this can be done with SimProcedures.

Imported lstrcmpA function
Fig 5: lstrcmpA function

lstrcmpA is located at 40130E. We would set a hook at this location to call a SimProcedure which emulates the behaviour of lstrcmpA.

Now lets see the code which implements all of these.
#!/usr/bin/env python
import angr
import simuvex
import binascii
import sys
part1 = None
part2 = None
# Calculates the installation id from the entered string
# This function just reverses the order of dwords in each quadword
def getInstallIdFromString(iid_string):
qword1, qword2, qword3, qword4 = iid_string.split('-')
dword1 = list(binascii.unhexlify(qword1))[3::-1]
dword2 = list(binascii.unhexlify(qword1))[7:3:-1]
dword3 = list(binascii.unhexlify(qword2))[3::-1]
dword4 = list(binascii.unhexlify(qword2))[7:3:-1]
dword5 = list(binascii.unhexlify(qword3))[3::-1]
dword6 = list(binascii.unhexlify(qword3))[7:3:-1]
dword7 = list(binascii.unhexlify(qword4))[3::-1]
dword8 = list(binascii.unhexlify(qword4))[7:3:-1]
return ''.join(dword1 + dword2 + dword3 + dword4 + dword5 + dword6 + dword7 + dword8)
def set_ebx_edx(state):
global part1, part2
state.regs.edx = part1
state.regs.ebx = part2
def main(iid_string):
global part1, part2
angr.path_group.l.setLevel('DEBUG')
# Calculate the install id from the string
install_id = getInstallIdFromString(iid_string)
# Load the binary
proj = angr.Project('toyproject.exe', load_options={'auto_load_libs': False})
# Hook strcmp
proj.hook(0x40130E, simuvex.SimProcedures['libc.so.6']['strcmp'], length=5)
# Create a blank state at 0x40122A i.e where check function is called
initial_state = proj.factory.blank_state(addr=0x40122A)
# The two parts of the serial
part1 = initial_state.se.BVS('part1', 32)
part2 = initial_state.se.BVS('part2', 32)
# Store the install id in memory
initial_state.memory.store(0x4093A8, install_id)
# Hook to set ebx and edx registers
proj.hook(0x4010ff, func=set_ebx_edx, length=6)
pg = proj.factory.path_group(initial_state)
# Go, go
pg.explore(find=0x401234, avoid=0x401249)
found_state = pg.found[0].state
p1 = found_state.se.any_int(part1)
p2 = found_state.se.any_int(part2)
print '%08X-%08X' %(p1, p1^p2)
if __name__ == '__main__':
if len(sys.argv) < 2:
print 'Please provide the installation id as an arguement'
else:
# Sanity check
assert len(sys.argv[1]) == 16*4+3
main(sys.argv[1])
view raw angrsolver.py hosted with ❤ by GitHub
Finally to wrap things up, here is an asciicast showing the solver in action.

No comments:

Post a Comment