Friday, 1 April 2016

Solving kao's toy project with symbolic execution and angr

Kao's toy project is a nifty and small crackme and quite ideal for demonstrating the power of symbolic execution. Running the crackme provides us with an installation id. We need to enter an unlock code which shows the goodboy message.

The main window
Fig 1: The main window
The installation id is calculated from the hard disk serial number. We will not focus on the algorithm that generates the installation id but rather on developing the keygen which calculates an unlock code when given an install id.

Before discussing about the function which checks the validity of the entered unlock code, it is important to mention that the installation id is 32 bytes (8 DWORDS) long and is displayed on the crackme screen in the form
D1D0-D3D2-D5D4-D7D6

i.e. within each QWORD the two DWORDS are stored in little endian order. We need to take this into account in our keygen program and convert the entered installation id to the proper form.

Previous Work

This crackme has previously been solved by Rolf Rolles who used a similar technique mentioned here in this blog post. While the method involving SMT solver is similar, Rolf used semi-automatic techniques, which translated the assembly code to IR and finally generated the constraints from the IR.

Before Rolf Rolles, this was solved by andrewl & Dcoder who used cryptanalysis techniques to reduce the keyspace. More recently, this was solved by Cr4sh who used the openreil framework.

The heart of the crackme

At the heart of the crackme lies this small function which checks whether a given unlock code is valid or not.
Control flow graph of the checking function
Fig 2: The checking function


The function takes two dwords (from the unlock code) as arguments which are then used to encode/encrypt the installation id (plaintext) to a given output buffer(ciphertext). For our entered unlock code to be valid, the encoded output must match the hardcoded string 0how4zdy81jpe5xfu92kar6cgiq3lst7.

Solving with Z3

At first we will try to model the system in Z3. Specifically, we will represent the encoding loop in Z3. Then we will use Z3 to solve the system and find the two dwords (unlock code) which encodes the installation id to the hardcoded string.

The script takes in the installation id as a command line argument. Lets' walk through the code step by step.
install_id = getInstallIdFromString(sys.argv[1])
Here we convert the install id into its proper form i.e the order of the two DWORDs within each QWORDs is reversed and returned as a list of integers.
target = map(ord, list('0how4zdy81jpe5xfu92kar6cgiq3lst7'))
After encoding the installation id it must match with the hardcoded string. Here we are converting the that string to a list of characters where each character is represented by its ASCII value.
part1 = edx = BitVec('part1', 32) 
part2 = ebx = BitVec('part2', 32)
We declare two bit-vectors of with a size of 32 bits each. These two bit vectors represents the two DWORDS of the unlock code.

for i in xrange(32):
    # text:00401105 lodsb
    byte = install_id[i]
        
    # text:00401106 sub al, bl
    byte -= Extract(7, 0, ebx)
    
    # text:00401108 xor al, dl
    byte ^= Extract(7, 0, edx)

    # text:0040110B rol edx, 1
    edx = RotateLeft(edx, 1)
        
    # text:0040110D rol ebx, 1
    ebx = RotateLeft(ebx, 1)
        
    # Add constraint
    s.add(byte == target[i])

The above loop describes the encoding process. Each character of the install_id is processed. This value must match the corresponding character in the target list. For this we use constraints.

# Solve the system
if s.check() == sat:
    m = s.model()
    print 'Unlock Code: ',
    print '%08X-%08X' %(m[part1].as_long(), m[part1].as_long () ^ m[part2].as_long())

Finally, we ask z3 to solve the system. and print the solutions.

Solving with angr

Okay we have already solved the crackme, so why another method? This is because I wanted to see if we can use angr for the same purpose, besides it would be a good learning experience.

Lets look at the cfg once again
Calling the check function
Fig 3: We want to execute the green basic block and avoid the red one

At 40122A function check is called. If our entered unlock code is correct check would return 1 and we would go to the green color basic block at 401234 which displays the good boy message.

Now to the cfg of the check function.
Control flow graph of the check function with hooks
Fig 4: Inserting hooks inside the check function

We are going to execute the above function symbolically. The unlock code which is comprised of two parts are passed as arguments to the function. Since we are executing this function in isolation we need provide the inputs ourselves, and this can be done by setting a hook at 4010FF (set_ebx_edx). Within the hook, we would store symbolic values representing the two parts of the unlock code into the ebx and edx registers.

Lastly, at 40111D there is a call to lstrcmpA. This function is imported from kernel32.dll.  Now, within our execution environment this dll is not loaded, we must emulate the behaviour of lstrcmpA and this can be done with SimProcedures.

Imported lstrcmpA function
Fig 5: lstrcmpA function

lstrcmpA is located at 40130E. We would set a hook at this location to call a SimProcedure which emulates the behaviour of lstrcmpA.

Now lets see the code which implements all of these.
Finally to wrap things up, here is an asciicast showing the solver in action.

No comments:

Post a Comment