Wednesday, 11 May 2016

PjOrion Deobfuscator Open Sourced

Update (11-July-2017)


The project PjOrion Deobfuscator has been discontinued. This is superseded by bytecode_simplifier.

PjOrion is a python bytecode protector. While originally developed for obfuscating World of Tanks mods it can be used for pretty much any python code. What makes this protector special is that it works on the python bytecode itself. It tampers the bytecode making in un-decompilable and un-disassemble by standard tools.  For scripted languages like python, this is quite a significant improvement considering code protection in python was just a myth.

An example

Some time in 2015 bomblader posted a crackme on tuts4you. I will be using the same crackme to demonstrate the protection offered by PjOrion.

PjOrion breaks existing disassemblers by tampering the bytecode. For example, using the standard dis module on the obfuscated pyc file results in the following output.

>>> import marshal, dis
>>> f = open('1.pyc', 'rb')
>>> f.seek(8)
>>> co = marshal.load(f)
>>> dis.disassemble(co)
1 >> 0 SETUP_EXCEPT 99 (to 102)
3 <144> 387
6 STOP_CODE
7 JUMP_FORWARD 217 (to 227)
10 <157> 44944
13 LOAD_NAME 28929
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\dis.py", line 97, in disassemble
print '(' + co.co_names[oparg] + ')',
IndexError: tuple index out of range

Not only did the disassembler fail but also there are several invalid opcodes in the listing. For example the opcode 144 is invalid and non existent. When cpython tries to execute an invalid opcode, it throws an exception. Without an exception handler installed the program would crash. This is precisely the reason why the very first instruction is SETUP_EXCEPT. The purpose of the instruction is to set up an exception handler at bytecode offset 102 which will be called when an exception is thrown.

It is clear that we need to follow the exception handler to understand the program flow. For this I developed a trivial program which could trace the program flow.

0 SETUP_EXCEPT 99
3 <INVALID>
102 POP_TOP
103 POP_TOP
104 POP_TOP
105 LOAD_CONST 1
108 JUMP_FORWARD 14
125 MAKE_FUNCTION 0
128 JUMP_ABSOLUTE 205
205 STORE_FAST 0
208 JUMP_ABSOLUTE 145
145 SETUP_FINALLY 6
148 JUMP_ABSOLUTE 160
160 JUMP_ABSOLUTE 55
55 LOAD_CONST 2
58 JUMP_FORWARD 16
77 LOAD_CONST 0
80 JUMP_ABSOLUTE 25
25 IMPORT_NAME 0
28 JUMP_FORWARD 34
65 STORE_FAST 1
<....snip...>
view raw trace.txt hosted with ❤ by GitHub

From the above listing, it is clear that in addition to the invalid instruction at the beginning the code is splattered with unconditional jumps. The result of this is a spaghetti control flow as shown in Fig 1 or an even more extreme example in Fig 2.

Fig 1: Too many jumps!
Fig 2: Devilish CFG

Deobfuscating and beyond 

To deobfuscate such files, I developed a tool PjOrion Deobfuscator (@github). It is currently in pre- alpha stage and may even not work. There are many moving pieces involved which needs refactoring to make this workable. With time I aim to improve this tool.

The tools removes redundant jumps between the basic blocks as in Fig 3. However this is not as simple as it sounds and needs to recursively disassemble the code stream. I also incorporated some ideas borrowed from the LLVM project to optimize the cfg.

Fig 3: Redundant Jumps removal
After removing the redundant jumps, we need to reassemble the modified cfg. This is also quite a task as we need to re-compute all instruction offsets and the position of the basic blocks within the reassembled instruction stream.

For now, you can use the tool to generate a CFG which should help to better understand the bytecode. Be sure to have pydotplus and graphviz installed before using it.

I would like to reiterate once again that the tool is in a pre-alpha stage and may not work for your files. However, I definitely aim to improve this tool with time.

PjOrion Deobfuscator: https://github.com/extremecoders-re/PjOrion-Deobfuscator