Some time ago, I posted a CTF style challenge on tuts4you.
The goal of the crackme was to find the flag which prints the goodboy message. Although the crackme was solved pretty quickly and all the provided solutions were of good quality, I wanted to explore if it can be entirely solved by automation, requiring as few manual interventions as possible. Today's post is thus an attempt in that regard.
For the purpose of this post, I have somewhat simplified the challenge. The original crackme sported self modifying code (SMC), with pairs of encryption/decryption loops. The challenge for demonstration in this post contains no SMC. Further it contains only 30 equations instead of the original 100.
To make it easier to follow, I am providing the source used in this challenge.
This can be easily compiled to an ELF.
ec@ubuntu:~/Desktop/crackme$ gcc findtheflag.cpp -o findtheflag
----------Find the flag: By ExtremeCoders, Nov 2015----------
Enter your flag:
Okay, that's not guesswork, we already know our flag, but let's see how we can find the flag.
Using IDA, we can see the structure of the challenge. It accepts the flag via fgets. This is stored in the global variable f. The function check, checks whether the provided flag is correct or not.
Moving onto the function check, we can see that it's quite big, comprising of a large number of basic blocks. Our target of this post would be to see if we can find out the flag without inspecting the code.
Enter symbolic execution
Quoting Wikipedia, "symbolic execution (also symbolic evaluation) is a means of analyzing a program to determine what inputs cause each part of a program to execute. An interpreter follows the program, assuming symbolic values for inputs rather than obtaining actual inputs as normal execution of the program would, a case of abstract interpretation. It thus arrives at expressions in terms of those symbols for expressions and variables in the program, and constraints in terms of those symbols for the possible outcomes of each conditional branch."
Thus symbolic execution is a way of running a program using symbolic values instead of providing concrete values. For instance, consider the following snippet of x86 assembly.
add eax, 0xDEADBEEF
xor eax, 0x1337
shr eax, 0x3
Let's say register eax initially contained the value 0x41414141. After executing these three lines it would contain 0x03FDE260.
Now suppose we are given that eax contains 0x31337 after executing the snippet. Can we find out what was the initial value of eax ? Of course we can, and we need to invert the algorithm for this.
shl eax, 0x3
xor eax, 0x1337
sub eax, 0xDEADBEEF
By running the snippet we can find that eax must hold the initial value 0x216ACBA0 for the given target value. The example above consisted of only 3 instructions. Inverting a huge algorithm consisting of a few hundred instructions would certainly not be a pleasant experience. It is then when symbolic execution proves fruitful. If we represent the entire system symbolically, we can see how a given initial value affects the execution and we can certainly find out the initial value when given a target value.
Using Z3 SMT Solver
For represent the above 3 instruction system symbolically we can use Z3. We will use the python bindings.
#!/usr/bin/python from z3 import * # Declare a 32 bit vector representing register eax reg_eax = BitVec('eax', 32) # add eax, 0xDEADBEEF eax = reg_eax + 0xDEADBEEF # xor eax, 0x1337 eax = eax ^ 0x1337 # shr eax, 0x3 eax = eax >> 0x3 # Create a solver instance s = Solver() # Add the constraint s.add(eax == 0x31337) # Check if satisfied if s.check() == sat: # Print value print hex(s.model()[reg_eax].as_long())
You can download the above code from here. Running the code you can easily see that eax must hold the original value of 0x216ACBA0 for the given target value of 0x31337.
Stay tuned for the next part...