Update (11-July-2017)
The project PjOrion Deobfuscator has been discontinued. This is superseded by bytecode_simplifier.
An example
Some time in 2015 bomblader posted a crackme on tuts4you. I will be using the same crackme to demonstrate the protection offered by PjOrion.
PjOrion breaks existing disassemblers by tampering the bytecode. For example, using the standard dis module on the obfuscated pyc file results in the following output.
Not only did the disassembler fail but also there are several invalid opcodes in the listing. For example the opcode 144 is invalid and non existent. When cpython tries to execute an invalid opcode, it throws an exception. Without an exception handler installed the program would crash. This is precisely the reason why the very first instruction is SETUP_EXCEPT. The purpose of the instruction is to set up an exception handler at bytecode offset 102 which will be called when an exception is thrown.
It is clear that we need to follow the exception handler to understand the program flow. For this I developed a trivial program which could trace the program flow.
From the above listing, it is clear that in addition to the invalid instruction at the beginning the code is splattered with unconditional jumps. The result of this is a spaghetti control flow as shown in Fig 1 or an even more extreme example in Fig 2.
Fig 1: Too many jumps! |
Fig 2: Devilish CFG |
Deobfuscating and beyond
To deobfuscate such files, I developed a tool PjOrion Deobfuscator (@github). It is currently in pre- alpha stage and may even not work. There are many moving pieces involved which needs refactoring to make this workable. With time I aim to improve this tool.
The tools removes redundant jumps between the basic blocks as in Fig 3. However this is not as simple as it sounds and needs to recursively disassemble the code stream. I also incorporated some ideas borrowed from the LLVM project to optimize the cfg.
Fig 3: Redundant Jumps removal |
After removing the redundant jumps, we need to reassemble the modified cfg. This is also quite a task as we need to re-compute all instruction offsets and the position of the basic blocks within the reassembled instruction stream.
I would like to reiterate once again that the tool is in a pre-alpha stage and may not work for your files. However, I definitely aim to improve this tool with time.
PjOrion Deobfuscator: https://github.com/extremecoders-re/PjOrion-Deobfuscator
PjOrion Deobfuscator: https://github.com/extremecoders-re/PjOrion-Deobfuscator