Wednesday 11 May 2016

PjOrion Deobfuscator Open Sourced

Update (11-July-2017)


The project PjOrion Deobfuscator has been discontinued. This is superseded by bytecode_simplifier.

PjOrion is a python bytecode protector. While originally developed for obfuscating World of Tanks mods it can be used for pretty much any python code. What makes this protector special is that it works on the python bytecode itself. It tampers the bytecode making in un-decompilable and un-disassemble by standard tools.  For scripted languages like python, this is quite a significant improvement considering code protection in python was just a myth.

An example

Some time in 2015 bomblader posted a crackme on tuts4you. I will be using the same crackme to demonstrate the protection offered by PjOrion.

PjOrion breaks existing disassemblers by tampering the bytecode. For example, using the standard dis module on the obfuscated pyc file results in the following output.


Not only did the disassembler fail but also there are several invalid opcodes in the listing. For example the opcode 144 is invalid and non existent. When cpython tries to execute an invalid opcode, it throws an exception. Without an exception handler installed the program would crash. This is precisely the reason why the very first instruction is SETUP_EXCEPT. The purpose of the instruction is to set up an exception handler at bytecode offset 102 which will be called when an exception is thrown.

It is clear that we need to follow the exception handler to understand the program flow. For this I developed a trivial program which could trace the program flow.


From the above listing, it is clear that in addition to the invalid instruction at the beginning the code is splattered with unconditional jumps. The result of this is a spaghetti control flow as shown in Fig 1 or an even more extreme example in Fig 2.

Fig 1: Too many jumps!
Fig 2: Devilish CFG

Deobfuscating and beyond 

To deobfuscate such files, I developed a tool PjOrion Deobfuscator (@github). It is currently in pre- alpha stage and may even not work. There are many moving pieces involved which needs refactoring to make this workable. With time I aim to improve this tool.

The tools removes redundant jumps between the basic blocks as in Fig 3. However this is not as simple as it sounds and needs to recursively disassemble the code stream. I also incorporated some ideas borrowed from the LLVM project to optimize the cfg.

Fig 3: Redundant Jumps removal
After removing the redundant jumps, we need to reassemble the modified cfg. This is also quite a task as we need to re-compute all instruction offsets and the position of the basic blocks within the reassembled instruction stream.

For now, you can use the tool to generate a CFG which should help to better understand the bytecode. Be sure to have pydotplus and graphviz installed before using it.

I would like to reiterate once again that the tool is in a pre-alpha stage and may not work for your files. However, I definitely aim to improve this tool with time.

PjOrion Deobfuscator: https://github.com/extremecoders-re/PjOrion-Deobfuscator