Wednesday 9 March 2016

Revisiting find the flag crackme (Part-2)

This is the 2nd and final part of the series find the flag crackme.

Back to the challenge

Coming back to the challenge, we will see how we can use symbolic execution for solving the challenge. We can use Z3 for representing the system symbolically but that is too much of a work as we need to convert each instruction to its Z3 equivalent. Hence we need to look at alternatives which will do this work automatically.

Enter angr

The angr project is the next-generation binary analysis framework created by the computer security lab at UC Santa Barbara. Among its myriads of capabilities, it can lift a raw binary to an intermediate language and perform symbolic execution on it, just the thing we are looking for. Additionally, it can be instructed to search for paths that lead to the execution of a particular instruction. It can also find out what initial values (in registers / memory etc) will lead to the execution of a particular path.

For this particular challenge, we need to find the path that will print the good boy message while avoiding the path printing the bad boy message. angr can then automatically found out the flag which will lead to the execution of this path. This is no magic but done through the power of symbolic execution and constraint solving.

Installing angr is pretty straightforward and well documented. Hence I will dive right into the actual problem.

Solving with angr

First we need to keep a note of some information about the binary.
The check function starts at VA 0x804846D.
Start of Check Function
Start of Check function

The basic block we want to execute is 0x8049A52 and the one we want to avoid is 0x8049A60.
The basic block on the left is the good boy
The basic block on the left is the good boy

The check function is called from 0x8049A44. Hence after returning execution resumes at 0x8049A49.
Execution resumes at 0x8049A49 after return
Execution resumes at 0x8049A49 after return
The flag is stored in an array at VA 0x804B060.
VA of the flag
VA of the flag

With the above information we can develop the solver in angr.

UPDATE 19-July-2017: The script below is out of date and will not work in current version of angr. Get the updated script here:

Running on CPython the script takes about a couple of minutes to find the flag. This is quite an impressive feat considering that we did not even analyzed the check function itself. Furthermore, we could have reduced the execution time by running on PyPy instead of CPython.

And finally, to end things with, here is an asciicast showing the solver in action. You can download the files that were used in this post from google drive.

Revisiting find the flag crackme (Part-1)

Some time ago, I posted a CTF style challenge on tuts4you.

The goal of the crackme was to find the flag which prints the goodboy message. Although the crackme was solved pretty quickly and all the provided solutions were of good quality, I wanted to explore if it can be entirely solved by automation, requiring as few manual interventions as possible. Today's post is thus an attempt in that regard.

The challenge

For the purpose of this post, I have somewhat simplified the challenge. The original crackme sported self modifying code (SMC), with pairs of encryption/decryption loops. The challenge for demonstration in this post contains no SMC. Further it contains only 30 equations instead of the original 100.

To make it easier to follow, I am providing the source used in this challenge.

This can be easily compiled to an ELF.

ec@ubuntu:~/Desktop/crackme$ gcc findtheflag.cpp -o findtheflag
ec@ubuntu:~/Desktop/crackme$ ./findtheflag
----------Find the flag: By ExtremeCoders, Nov 2015----------
Enter your flag: 
That's it!

Okay, that's not guesswork, we already know our flag, but let's see how we can find the flag.

The disassembly

Using IDA, we can see the structure of the challenge. It accepts the flag via fgets. This is stored in the global variable f. The function check, checks whether the provided flag is correct or not.

Moving onto the function check, we can see that it's quite big, comprising of a large number of basic blocks. Our target of this post would be to see if we can find out the flag without inspecting the code.

Enter symbolic execution

Quoting Wikipedia, "symbolic execution (also symbolic evaluation) is a means of analyzing a program to determine what inputs cause each part of a program to execute. An interpreter follows the program, assuming symbolic values for inputs rather than obtaining actual inputs as normal execution of the program would, a case of abstract interpretation. It thus arrives at expressions in terms of those symbols for expressions and variables in the program, and constraints in terms of those symbols for the possible outcomes of each conditional branch." 

Thus symbolic execution is a way of running a program using symbolic values instead of providing concrete values. For instance, consider the following snippet of x86 assembly.

add eax, 0xDEADBEEF
xor eax, 0x1337
shr eax, 0x3

Let's say register eax initially contained the value 0x41414141. After executing these three lines it would contain 0x03FDE260.
Now suppose we are given that eax contains 0x31337 after executing the snippet. Can we find out what was the initial value of eax ? Of course we can, and we need to invert the algorithm for this.

shl eax, 0x3
xor eax, 0x1337
sub eax, 0xDEADBEEF

By running the snippet we can find that eax must hold the initial value 0x216ACBA0 for the given target value. The example above consisted of only 3 instructions. Inverting a huge algorithm consisting of a few hundred instructions would certainly not be a pleasant experience. It is then when symbolic execution proves fruitful. If we represent the entire system symbolically, we can see how a given initial value affects the execution and we can certainly find out the initial value when given a target value.

Using Z3 SMT Solver

For represent the above 3 instruction system symbolically we can use Z3. We will use the python bindings.
from z3 import *

# Declare a 32 bit vector representing register eax
reg_eax = BitVec('eax', 32)

# add eax, 0xDEADBEEF
eax = reg_eax + 0xDEADBEEF

# xor eax, 0x1337
eax = eax ^ 0x1337

# shr eax, 0x3
eax = eax >> 0x3

# Create a solver instance
s = Solver()

# Add the constraint
s.add(eax == 0x31337)

# Check if satisfied
if s.check() == sat:
 # Print value
 print hex(s.model()[reg_eax].as_long())

You can download the above code from here. Running the code you can easily see that eax must hold the original value of 0x216ACBA0 for the given target value of 0x31337.

Stay tuned for the next part...

Monday 7 March 2016

Hello World

This will be my first attempt at serious blogging and it begins with the obligatory Hello World.

To start things off here is a piece javascript code shamelessly taken from here which prints this obligatory message.

and here is the output...