Tuesday, 11 July 2017

Deobfuscating PjOrion using bytecode simplifier

Bytecode simplifier is a tool to de-obfuscate PjOrion protected python scripts. This post is a short tutorial to show how to use this module to deobfuscate a protected python script.

I have used the sample code below to demonstrate its usage. This is a small program to calculate the factorial of a number.

# Python program to find the factorial of a number using recursion
 
def recur_factorial(n):
   """Function to return the factorial
   of a number using recursion"""
   if n == 1:
       return n
   else:
       return n*recur_factorial(n-1)
 
 
# take input from the user
num = int(input("Enter a number: "))
 
# check is the number is negative
if num < 0:
   print("Sorry, factorial does not exist for negative numbers")
elif num == 0:
   print("The factorial of 0 is 1")
else:
   print("The factorial of",num,"is",recur_factorial(num))

I have first compiled the script and protected it as shown in Figure 1 & 2.

Compiling the script
Figure 1: Compiling the script
Figure 2: Protecting the generated pyc file
Figure 2: Protecting the generated pyc file
After protection, we get a large file example.pyc about 27 KiB in size. This is the file we will be working on.

The stock python interpreter does not have bytecode tracing facilities inbuilt. Hence we have to use a modified version of Python which supports bytecode tracing. I have provided a precompiled version of Python 2.7.13 with bytecode tracing support at Github. The python27.dll file has to be copied to  C:\Windows\System32\. Make sure to backup the existing dll so that you can revert when finished.

Step - 1: Unwrapping the layers


The first step is to unwrap the protection layers to get hold of the actual obfuscated code object. For this, we will be using the pjunwrapper module as shown below.

C:\pj-dump>python pjunwrapper.py --ifile=example.pyc
XXX lineno: 1, opcode: 156
[*] Dumped 1 code object
XXX lineno: 1, opcode: 213
XXX lineno: 1, opcode: 184
XXX lineno: 1, opcode: 240
XXX lineno: 1, opcode: 240
XXX lineno: 1, opcode: 240
XXX lineno: 1, opcode: 240
[*] Dumped 1 code object
XXX lineno: 1, opcode: 7
XXX lineno: 1, opcode: 45
[*] Dumped 1 code object
XXX lineno: 1, opcode: 161
Enter a number: ^D
Error in module '__main__': unexpected EOF while parsing (<string>, line 1)

PjUnwrapper requires the pystack extension module. Make sure that the extension is present in python path. Running this, some files having names of wrapper_ would be dumped. These are basically the wrapper layers over the actual obfuscated code. In our case, the obfuscated code has a file name wrapper_3.pyc as shown in Figure 3. In general, the highest numbered file contains the final obfuscated code.


Figure 3: Unwrapping the protection layers
Figure 3: Unwrapping the protection layers

Step - 2: Deobfuscating


The final step is to run bytecode_simplifier over wrapper_3.pyc as shown below.

C:\bytecode_simplifier\main.py --ifile=wrapper_3.pyc --ofile=wrapper_deobf.pyc
INFO:__main__:Opening file wrapper_3.pyc
INFO:__main__:Input pyc file header matched
DEBUG:__main__:Unmarshalling file
INFO:__main__:Processing code object \x0f\x1d\n\x00\x07\x0f\x0f
DEBUG:deobfuscator:Code entrypoint matched PjOrion signature v1
INFO:deobfuscator:Original code entrypoint at 269
INFO:deobfuscator:Starting control flow analysis...
DEBUG:disassembler:Finding leaders...
DEBUG:disassembler:Start leader at 269
DEBUG:disassembler:End leader at 272
DEBUG:disassembler:Start leader at 272
DEBUG:disassembler:End leader at 117
DEBUG:disassembler:Start leader at 117
DEBUG:disassembler:End leader at 82
DEBUG:disassembler:Start leader at 82
DEBUG:disassembler:End leader at 28
DEBUG:disassembler:Start leader at 28
DEBUG:disassembler:End leader at 177
DEBUG:disassembler:Start leader at 177
DEBUG:disassembler:End leader at 125
DEBUG:disassembler:Start leader at 125
DEBUG:disassembler:End leader at 155
DEBUG:disassembler:Start leader at 155
DEBUG:disassembler:End leader at 60
DEBUG:disassembler:Start leader at 60
DEBUG:disassembler:End leader at 165
DEBUG:disassembler:Start leader at 165
DEBUG:disassembler:End leader at 353
DEBUG:disassembler:Start leader at 353
DEBUG:disassembler:End leader at 303
DEBUG:disassembler:Start leader at 303
DEBUG:disassembler:End leader at 190
DEBUG:disassembler:Start leader at 190
DEBUG:disassembler:End leader at 235
DEBUG:disassembler:Start leader at 235
DEBUG:disassembler:Start leader at 235
DEBUG:disassembler:End leader at 51
DEBUG:disassembler:Start leader at 51
DEBUG:disassembler:End leader at 238
DEBUG:disassembler:Start leader at 238
DEBUG:disassembler:End leader at 313
DEBUG:disassembler:Start leader at 313
DEBUG:disassembler:End leader at 105
DEBUG:disassembler:Start leader at 105
DEBUG:disassembler:End leader at 246
DEBUG:disassembler:Start leader at 246
DEBUG:disassembler:End leader at 142
DEBUG:disassembler:Start leader at 142
DEBUG:disassembler:End leader at 71
DEBUG:disassembler:Start leader at 71
DEBUG:disassembler:End leader at 229
DEBUG:disassembler:Start leader at 229
DEBUG:disassembler:End leader at 33
DEBUG:disassembler:Start leader at 33
DEBUG:disassembler:Start leader at 33
DEBUG:disassembler:End leader at 44
DEBUG:disassembler:Start leader at 44
DEBUG:disassembler:End leader at 342
DEBUG:disassembler:Start leader at 342
DEBUG:disassembler:End leader at 36
DEBUG:disassembler:Start leader at 36
DEBUG:disassembler:End leader at 94
DEBUG:disassembler:Start leader at 94
DEBUG:disassembler:End leader at 17
DEBUG:disassembler:Start leader at 17
DEBUG:disassembler:End leader at 285
DEBUG:disassembler:Start leader at 285
DEBUG:disassembler:End leader at 295
DEBUG:disassembler:Start leader at 295
DEBUG:disassembler:End leader at 257
DEBUG:disassembler:Start leader at 257
DEBUG:disassembler:End leader at 197
DEBUG:disassembler:Start leader at 197
DEBUG:disassembler:End leader at 349
DEBUG:disassembler:End leader at 207
DEBUG:disassembler:Start leader at 207
DEBUG:disassembler:End leader at 361
DEBUG:disassembler:Start leader at 361
DEBUG:disassembler:End leader at 221
DEBUG:disassembler:Start leader at 221
DEBUG:disassembler:End leader at 332
DEBUG:disassembler:Start leader at 332
DEBUG:disassembler:End leader at 324
DEBUG:disassembler:Start leader at 324
DEBUG:disassembler:End leader at 134
DEBUG:disassembler:Start leader at 134
DEBUG:disassembler:End leader at 369
DEBUG:disassembler:Start leader at 369
DEBUG:disassembler:End leader at 6
DEBUG:disassembler:Start leader at 6
DEBUG:disassembler:End leader at 94
DEBUG:disassembler:Start leader at 94
DEBUG:disassembler:Found 81 leaders
DEBUG:disassembler:Constructing basic blocks...
DEBUG:disassembler:Creating basic block 0x24bd800 spanning from 5 to 6, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca558 spanning from 14 to 17, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca5d0 spanning from 25 to 28, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca648 spanning from 33 to 33, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca698 spanning from 36 to 36, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca6e8 spanning from 44 to 44, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca738 spanning from 51 to 51, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca788 spanning from 57 to 60, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca800 spanning from 68 to 71, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca878 spanning from 79 to 82, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca8f0 spanning from 93 to 94, end exclusive
DEBUG:disassembler:Creating basic block 0x24ca940 spanning from 94 to 94, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca990 spanning from 102 to 105, both inclusive
DEBUG:disassembler:Creating basic block 0x24caa30 spanning from 114 to 117, both inclusive
DEBUG:disassembler:Creating basic block 0x24caaf8 spanning from 122 to 125, both inclusive
DEBUG:disassembler:Creating basic block 0x24cabc0 spanning from 131 to 134, both inclusive
DEBUG:disassembler:Creating basic block 0x24cac88 spanning from 141 to 142, both inclusive
DEBUG:disassembler:Creating basic block 0x24cad50 spanning from 152 to 155, both inclusive
DEBUG:disassembler:Creating basic block 0x24cae18 spanning from 162 to 165, both inclusive
DEBUG:disassembler:Creating basic block 0x24caee0 spanning from 174 to 177, both inclusive
DEBUG:disassembler:Creating basic block 0x24cafa8 spanning from 187 to 190, both inclusive
DEBUG:disassembler:Creating basic block 0x24ce0a8 spanning from 196 to 197, both inclusive
DEBUG:disassembler:Creating basic block 0x24ce170 spanning from 204 to 207, both inclusive
DEBUG:disassembler:Creating basic block 0x24ce238 spanning from 218 to 221, both inclusive
DEBUG:disassembler:Creating basic block 0x24ce300 spanning from 228 to 229, both inclusive
DEBUG:disassembler:Creating basic block 0x24ce3c8 spanning from 235 to 235, both inclusive
DEBUG:disassembler:Creating basic block 0x24ce468 spanning from 238 to 238, both inclusive
DEBUG:disassembler:Creating basic block 0x24ce508 spanning from 243 to 246, both inclusive
DEBUG:disassembler:Creating basic block 0x24ce5d0 spanning from 254 to 257, both inclusive
DEBUG:disassembler:Creating basic block 0x24ce698 spanning from 269 to 272, both inclusive
DEBUG:disassembler:Creating basic block 0x24ce760 spanning from 282 to 285, both inclusive
DEBUG:disassembler:Creating basic block 0x24ce828 spanning from 292 to 295, both inclusive
DEBUG:disassembler:Creating basic block 0x24ce8f0 spanning from 300 to 303, both inclusive
DEBUG:disassembler:Creating basic block 0x24ce9b8 spanning from 310 to 313, both inclusive
DEBUG:disassembler:Creating basic block 0x24cea80 spanning from 321 to 324, both inclusive
DEBUG:disassembler:Creating basic block 0x24ceb48 spanning from 332 to 332, both inclusive
DEBUG:disassembler:Creating basic block 0x24cebe8 spanning from 342 to 342, both inclusive
DEBUG:disassembler:Creating basic block 0x24cec88 spanning from 349 to 349, both inclusive
DEBUG:disassembler:Creating basic block 0x24ced28 spanning from 350 to 353, both inclusive
DEBUG:disassembler:Creating basic block 0x24cedf0 spanning from 360 to 361, both inclusive
DEBUG:disassembler:Creating basic block 0x24ceeb8 spanning from 366 to 369, both inclusive
DEBUG:disassembler:41 basic blocks created
DEBUG:disassembler:Constructing edges between basic blocks...
DEBUG:disassembler:Adding explicit edge from block 0x24bd800 to 0x24ca8f0
DEBUG:disassembler:Adding explicit edge from block 0x24ca800 to 0x24ca648
DEBUG:disassembler:Adding explicit edge from block 0x24ce828 to 0x24cec88
DEBUG:disassembler:Adding explicit edge from block 0x24ca878 to 0x24ca5d0
DEBUG:disassembler:Adding explicit edge from block 0x24ce0a8 to 0x24cedf0
DEBUG:disassembler:Adding explicit edge from block 0x24ca940 to 0x24ce828
DEBUG:disassembler:Adding explicit edge from block 0x24ca6e8 to 0x24ca940
DEBUG:disassembler:Adding explicit edge from block 0x24ce170 to 0x24ce238
DEBUG:disassembler:Adding explicit edge from block 0x24ca990 to 0x24cac88
DEBUG:disassembler:Adding explicit edge from block 0x24ce9b8 to 0x24ce508
DEBUG:disassembler:Adding explicit edge from block 0x24caa30 to 0x24ca878
DEBUG:disassembler:Adding explicit edge from block 0x24cea80 to 0x24cabc0
DEBUG:disassembler:Adding explicit edge from block 0x24caaf8 to 0x24cad50
DEBUG:disassembler:Adding explicit edge from block 0x24ce300 to 0x24ca6e8
DEBUG:disassembler:Adding explicit edge from block 0x24ceb48 to 0x24ca940
DEBUG:disassembler:Adding implicit edge from block 0x24ce3c8 to 0x24ce468
DEBUG:disassembler:Adding explicit edge from block 0x24ce3c8 to 0x24ca738
DEBUG:disassembler:Adding explicit edge from block 0x24cabc0 to 0x24ceeb8
DEBUG:disassembler:Adding explicit edge from block 0x24cebe8 to 0x24ca558
DEBUG:disassembler:Adding explicit edge from block 0x24ce468 to 0x24ca990
DEBUG:disassembler:Adding explicit edge from block 0x24cac88 to 0x24ce300
DEBUG:disassembler:Adding explicit edge from block 0x24ce508 to 0x24ca800
DEBUG:disassembler:Adding explicit edge from block 0x24ced28 to 0x24ce8f0
DEBUG:disassembler:Adding explicit edge from block 0x24ce238 to 0x24cea80
DEBUG:disassembler:Adding explicit edge from block 0x24ca558 to 0x24ce5d0
DEBUG:disassembler:Adding explicit edge from block 0x24ce8f0 to 0x24cafa8
DEBUG:disassembler:Adding explicit edge from block 0x24ca5d0 to 0x24caee0
DEBUG:disassembler:Adding explicit edge from block 0x24ce5d0 to 0x24ce170
DEBUG:disassembler:Adding explicit edge from block 0x24cedf0 to 0x24ceb48
DEBUG:disassembler:Adding explicit edge from block 0x24cae18 to 0x24ced28
DEBUG:disassembler:Adding implicit edge from block 0x24ca648 to 0x24ca698
DEBUG:disassembler:Adding explicit edge from block 0x24ca648 to 0x24cebe8
DEBUG:disassembler:Adding explicit edge from block 0x24ca698 to 0x24ce760
DEBUG:disassembler:Adding explicit edge from block 0x24ceeb8 to 0x24bd800
DEBUG:disassembler:Adding explicit edge from block 0x24caee0 to 0x24caaf8
DEBUG:disassembler:Adding explicit edge from block 0x24ca738 to 0x24ce9b8
DEBUG:disassembler:Adding explicit edge from block 0x24ce760 to 0x24ce0a8
DEBUG:disassembler:Adding explicit edge from block 0x24ce698 to 0x24caa30
DEBUG:disassembler:Adding explicit edge from block 0x24ca788 to 0x24cae18
DEBUG:disassembler:Adding explicit edge from block 0x24cafa8 to 0x24ce3c8
DEBUG:disassembler:Adding explicit edge from block 0x24cad50 to 0x24ca788
INFO:deobfuscator:Control flow analysis completed.
INFO:deobfuscator:Starting simplication of basic blocks...
DEBUG:simplifier:Eliminating forwarders...
INFO:simplifier:Adding explicit edge from block 0x24ceb48 to 0x24ce828
INFO:simplifier:Adding explicit edge from block 0x24ca6e8 to 0x24ce828
INFO:simplifier:Adding implicit edge from block 0x24ca8f0 to 0x24ce828
DEBUG:simplifier:Forwarder basic block 0x24ca940 eliminated
INFO:simplifier:Adding explicit edge from block 0x24ce300 to 0x24ce828
DEBUG:simplifier:Forwarder basic block 0x24ca6e8 eliminated
INFO:simplifier:Adding explicit edge from block 0x24cedf0 to 0x24ce828
DEBUG:simplifier:Forwarder basic block 0x24ceb48 eliminated
INFO:simplifier:Adding explicit edge from block 0x24ca648 to 0x24ca558
DEBUG:simplifier:Forwarder basic block 0x24cebe8 eliminated
INFO:simplifier:Adding implicit edge from block 0x24ce3c8 to 0x24ca990
DEBUG:simplifier:Forwarder basic block 0x24ce468 eliminated
INFO:simplifier:Adding implicit edge from block 0x24ca648 to 0x24ce760
DEBUG:simplifier:Forwarder basic block 0x24ca698 eliminated
INFO:simplifier:Adding explicit edge from block 0x24ce3c8 to 0x24ce9b8
DEBUG:simplifier:Forwarder basic block 0x24ca738 eliminated
INFO:simplifier:7 basic blocks eliminated
DEBUG:simplifier:Merging basic blocks...
INFO:simplifier:Adding explicit edge from block 0x24ceeb8 to 0x24ca8f0
DEBUG:simplifier:Basic block 0x24bd800 merged with block 0x24ceeb8
INFO:simplifier:Adding explicit edge from block 0x24ce508 to 0x24ca648
DEBUG:simplifier:Basic block 0x24ca800 merged with block 0x24ce508
INFO:simplifier:Adding explicit edge from block 0x24caa30 to 0x24ca5d0
DEBUG:simplifier:Basic block 0x24ca878 merged with block 0x24caa30
INFO:simplifier:Adding explicit edge from block 0x24ce760 to 0x24cedf0
DEBUG:simplifier:Basic block 0x24ce0a8 merged with block 0x24ce760
INFO:simplifier:Adding implicit edge from block 0x24ceeb8 to 0x24ce828
DEBUG:simplifier:Basic block 0x24ca8f0 merged with block 0x24ceeb8
INFO:simplifier:Adding explicit edge from block 0x24ce5d0 to 0x24ce238
DEBUG:simplifier:Basic block 0x24ce170 merged with block 0x24ce5d0
INFO:simplifier:Adding explicit edge from block 0x24ce698 to 0x24ca5d0
DEBUG:simplifier:Basic block 0x24caa30 merged with block 0x24ce698
INFO:simplifier:Adding explicit edge from block 0x24ce238 to 0x24cabc0
DEBUG:simplifier:Basic block 0x24cea80 merged with block 0x24ce238
INFO:simplifier:Adding explicit edge from block 0x24caee0 to 0x24cad50
DEBUG:simplifier:Basic block 0x24caaf8 merged with block 0x24caee0
INFO:simplifier:Adding explicit edge from block 0x24cac88 to 0x24ce828
DEBUG:simplifier:Basic block 0x24ce300 merged with block 0x24cac88
DEBUG:simplifier:Basic block 0x24cec88 merged with block 0x24ce828
INFO:simplifier:Adding implicit edge from block 0x24cafa8 to 0x24ca990
INFO:simplifier:Adding explicit edge from block 0x24cafa8 to 0x24ce9b8
DEBUG:simplifier:Basic block 0x24ce3c8 merged with block 0x24cafa8
INFO:simplifier:Adding explicit edge from block 0x24ce238 to 0x24ceeb8
DEBUG:simplifier:Basic block 0x24cabc0 merged with block 0x24ce238
INFO:simplifier:Adding explicit edge from block 0x24ca990 to 0x24ce828
DEBUG:simplifier:Basic block 0x24cac88 merged with block 0x24ca990
INFO:simplifier:Adding explicit edge from block 0x24ce9b8 to 0x24ca648
DEBUG:simplifier:Basic block 0x24ce508 merged with block 0x24ce9b8
INFO:simplifier:Adding explicit edge from block 0x24cae18 to 0x24ce8f0
DEBUG:simplifier:Basic block 0x24ced28 merged with block 0x24cae18
INFO:simplifier:Adding explicit edge from block 0x24ce5d0 to 0x24ceeb8
DEBUG:simplifier:Basic block 0x24ce238 merged with block 0x24ce5d0
INFO:simplifier:Adding explicit edge from block 0x24cae18 to 0x24cafa8
DEBUG:simplifier:Basic block 0x24ce8f0 merged with block 0x24cae18
INFO:simplifier:Adding explicit edge from block 0x24ce698 to 0x24caee0
DEBUG:simplifier:Basic block 0x24ca5d0 merged with block 0x24ce698
INFO:simplifier:Adding explicit edge from block 0x24ca558 to 0x24ceeb8
DEBUG:simplifier:Basic block 0x24ce5d0 merged with block 0x24ca558
INFO:simplifier:Adding explicit edge from block 0x24ce760 to 0x24ce828
DEBUG:simplifier:Basic block 0x24cedf0 merged with block 0x24ce760
INFO:simplifier:Adding explicit edge from block 0x24ca788 to 0x24cafa8
DEBUG:simplifier:Basic block 0x24cae18 merged with block 0x24ca788
INFO:simplifier:Adding explicit edge from block 0x24ce9b8 to 0x24ca558
INFO:simplifier:Adding implicit edge from block 0x24ce9b8 to 0x24ce760
DEBUG:simplifier:Basic block 0x24ca648 merged with block 0x24ce9b8
INFO:simplifier:Adding implicit edge from block 0x24ca558 to 0x24ce828
DEBUG:simplifier:Basic block 0x24ceeb8 merged with block 0x24ca558
INFO:simplifier:Adding explicit edge from block 0x24ce698 to 0x24cad50
DEBUG:simplifier:Basic block 0x24caee0 merged with block 0x24ce698
INFO:simplifier:Adding explicit edge from block 0x24cad50 to 0x24cafa8
DEBUG:simplifier:Basic block 0x24ca788 merged with block 0x24cad50
INFO:simplifier:Adding implicit edge from block 0x24cad50 to 0x24ca990
INFO:simplifier:Adding explicit edge from block 0x24cad50 to 0x24ce9b8
DEBUG:simplifier:Basic block 0x24cafa8 merged with block 0x24cad50
INFO:simplifier:Adding implicit edge from block 0x24ce698 to 0x24ca990
INFO:simplifier:Adding explicit edge from block 0x24ce698 to 0x24ce9b8
DEBUG:simplifier:Basic block 0x24cad50 merged with block 0x24ce698
INFO:simplifier:28 basic blocks merged.
INFO:deobfuscator:Simplication of basic blocks completed.
INFO:deobfuscator:Beginning verification of simplified basic block graph...
INFO:deobfuscator:Verification succeeded.
INFO:deobfuscator:Assembling basic blocks...
DEBUG:assembler:Performing a DFS on the graph to generate the layout of the blocks.
DEBUG:assembler:Morphing some JUMP_ABSOLUTE instructions to make file decompilable.
DEBUG:assembler:Verifying generated layout...
DEBUG:assembler:Successfully verified layout.
DEBUG:assembler:Calculating addresses of basic blocks.
DEBUG:assembler:Calculating instruction operands.
DEBUG:assembler:Generating code...
INFO:deobfuscator:Successfully assembled. 
INFO:__main__:Successfully deobfuscated code object \x0f\x1d\n\x00\x07\x0f\x0f
INFO:__main__:Collecting constants for code object \x0f\x1d\n\x00\x07\x0f\x0f
INFO:__main__:Code object \x0f\x1d\n\x00\x07\x0f\x0f contains embedded code object recur_factorial
INFO:__main__:Processing code object recur_factorial
DEBUG:deobfuscator:Code entrypoint matched PjOrion signature v2
INFO:deobfuscator:Original code entrypoint at 161
INFO:deobfuscator:Starting control flow analysis...
DEBUG:disassembler:Finding leaders...
DEBUG:disassembler:Start leader at 161
DEBUG:disassembler:End leader at 164
DEBUG:disassembler:Start leader at 164
DEBUG:disassembler:End leader at 46
DEBUG:disassembler:Start leader at 46
DEBUG:disassembler:End leader at 141
DEBUG:disassembler:Start leader at 141
DEBUG:disassembler:End leader at 19
DEBUG:disassembler:Start leader at 19
DEBUG:disassembler:Start leader at 19
DEBUG:disassembler:End leader at 127
DEBUG:disassembler:Start leader at 127
DEBUG:disassembler:End leader at 22
DEBUG:disassembler:Start leader at 22
DEBUG:disassembler:End leader at 66
DEBUG:disassembler:Start leader at 66
DEBUG:disassembler:End leader at 105
DEBUG:disassembler:Start leader at 105
DEBUG:disassembler:End leader at 87
DEBUG:disassembler:Start leader at 87
DEBUG:disassembler:End leader at 126
DEBUG:disassembler:End leader at 154
DEBUG:disassembler:Start leader at 154
DEBUG:disassembler:End leader at 113
DEBUG:disassembler:Start leader at 113
DEBUG:disassembler:End leader at 93
DEBUG:disassembler:Start leader at 93
DEBUG:disassembler:End leader at 34
DEBUG:disassembler:Start leader at 34
DEBUG:disassembler:End leader at 11
DEBUG:disassembler:Start leader at 11
DEBUG:disassembler:End leader at 53
DEBUG:disassembler:Found 32 leaders
DEBUG:disassembler:Constructing basic blocks...
DEBUG:disassembler:Creating basic block 0x24ce3a0 spanning from 10 to 11, both inclusive
DEBUG:disassembler:Creating basic block 0x24bddc8 spanning from 19 to 19, both inclusive
DEBUG:disassembler:Creating basic block 0x24bda30 spanning from 22 to 22, both inclusive
DEBUG:disassembler:Creating basic block 0x24bdad0 spanning from 31 to 34, both inclusive
DEBUG:disassembler:Creating basic block 0x24bd9b8 spanning from 43 to 46, both inclusive
DEBUG:disassembler:Creating basic block 0x24bde68 spanning from 53 to 53, both inclusive
DEBUG:disassembler:Creating basic block 0x24bdc10 spanning from 63 to 66, both inclusive
DEBUG:disassembler:Creating basic block 0x24bdcb0 spanning from 84 to 87, both inclusive
DEBUG:disassembler:Creating basic block 0x24bd7d8 spanning from 92 to 93, both inclusive
DEBUG:disassembler:Creating basic block 0x24bdb98 spanning from 102 to 105, both inclusive
DEBUG:disassembler:Creating basic block 0x24bdf58 spanning from 110 to 113, both inclusive
DEBUG:disassembler:Creating basic block 0x24bdb48 spanning from 126 to 126, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca8a0 spanning from 127 to 127, both inclusive
DEBUG:disassembler:Creating basic block 0x24caf58 spanning from 138 to 141, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca7b0 spanning from 151 to 154, both inclusive
DEBUG:disassembler:Creating basic block 0x24ca670 spanning from 161 to 164, both inclusive
DEBUG:disassembler:16 basic blocks created
DEBUG:disassembler:Constructing edges between basic blocks...
DEBUG:disassembler:Adding explicit edge from block 0x24bdc10 to 0x24bdcb0
DEBUG:disassembler:Adding explicit edge from block 0x24ca7b0 to 0x24bdf58
DEBUG:disassembler:Adding explicit edge from block 0x24bda30 to 0x24bdb98
DEBUG:disassembler:Adding explicit edge from block 0x24ca670 to 0x24bd9b8
DEBUG:disassembler:Adding explicit edge from block 0x24ca8a0 to 0x24bdc10
DEBUG:disassembler:Adding explicit edge from block 0x24bdcb0 to 0x24ca7b0
DEBUG:disassembler:Adding explicit edge from block 0x24bdad0 to 0x24ce3a0
DEBUG:disassembler:Adding explicit edge from block 0x24bdf58 to 0x24bd7d8
DEBUG:disassembler:Adding explicit edge from block 0x24bdb98 to 0x24bdb48
DEBUG:disassembler:Adding explicit edge from block 0x24ce3a0 to 0x24bde68
DEBUG:disassembler:Adding explicit edge from block 0x24bd9b8 to 0x24caf58
DEBUG:disassembler:Adding implicit edge from block 0x24bddc8 to 0x24bda30
DEBUG:disassembler:Adding explicit edge from block 0x24bddc8 to 0x24ca8a0
DEBUG:disassembler:Adding explicit edge from block 0x24bd7d8 to 0x24bdad0
DEBUG:disassembler:Adding explicit edge from block 0x24caf58 to 0x24bddc8
INFO:deobfuscator:Control flow analysis completed.
INFO:deobfuscator:Starting simplication of basic blocks...
DEBUG:simplifier:Eliminating forwarders...
INFO:simplifier:Adding implicit edge from block 0x24bddc8 to 0x24bdb98
DEBUG:simplifier:Forwarder basic block 0x24bda30 eliminated
INFO:simplifier:Adding explicit edge from block 0x24bddc8 to 0x24bdc10
DEBUG:simplifier:Forwarder basic block 0x24ca8a0 eliminated
INFO:simplifier:2 basic blocks eliminated
DEBUG:simplifier:Merging basic blocks...
INFO:simplifier:Adding explicit edge from block 0x24bdcb0 to 0x24bdf58
DEBUG:simplifier:Basic block 0x24ca7b0 merged with block 0x24bdcb0
DEBUG:simplifier:Basic block 0x24bde68 merged with block 0x24ce3a0
INFO:simplifier:Adding explicit edge from block 0x24bdc10 to 0x24bdf58
DEBUG:simplifier:Basic block 0x24bdcb0 merged with block 0x24bdc10
INFO:simplifier:Adding explicit edge from block 0x24bd7d8 to 0x24ce3a0
DEBUG:simplifier:Basic block 0x24bdad0 merged with block 0x24bd7d8
DEBUG:simplifier:Basic block 0x24bdb48 merged with block 0x24bdb98
INFO:simplifier:Adding explicit edge from block 0x24bdc10 to 0x24bd7d8
DEBUG:simplifier:Basic block 0x24bdf58 merged with block 0x24bdc10
DEBUG:simplifier:Basic block 0x24ce3a0 merged with block 0x24bd7d8
INFO:simplifier:Adding explicit edge from block 0x24ca670 to 0x24caf58
DEBUG:simplifier:Basic block 0x24bd9b8 merged with block 0x24ca670
INFO:simplifier:Adding implicit edge from block 0x24caf58 to 0x24bdb98
INFO:simplifier:Adding explicit edge from block 0x24caf58 to 0x24bdc10
DEBUG:simplifier:Basic block 0x24bddc8 merged with block 0x24caf58
DEBUG:simplifier:Basic block 0x24bd7d8 merged with block 0x24bdc10
INFO:simplifier:Adding implicit edge from block 0x24ca670 to 0x24bdb98
INFO:simplifier:Adding explicit edge from block 0x24ca670 to 0x24bdc10
DEBUG:simplifier:Basic block 0x24caf58 merged with block 0x24ca670
INFO:simplifier:11 basic blocks merged.
INFO:deobfuscator:Simplication of basic blocks completed.
INFO:deobfuscator:Beginning verification of simplified basic block graph...
INFO:deobfuscator:Verification succeeded.
INFO:deobfuscator:Assembling basic blocks...
DEBUG:assembler:Performing a DFS on the graph to generate the layout of the blocks.
DEBUG:assembler:Morphing some JUMP_ABSOLUTE instructions to make file decompilable.
DEBUG:assembler:Verifying generated layout...
DEBUG:assembler:Successfully verified layout.
DEBUG:assembler:Calculating addresses of basic blocks.
DEBUG:assembler:Calculating instruction operands.
DEBUG:assembler:Generating code...
INFO:deobfuscator:Successfully assembled. 
INFO:__main__:Successfully deobfuscated code object recur_factorial
INFO:__main__:Collecting constants for code object recur_factorial
INFO:__main__:Generating new code object for recur_factorial
INFO:__main__:Generating new code object for \x0f\x1d\n\x00\x07\x0f\x0f
INFO:__main__:Writing deobfuscated code object to disk
INFO:__main__:Success

Running this we get back the deobfuscated code in the file wrapper_deobf.pyc. We can now run a python decompiler on this to get back our deobfuscated code as shown in Figure 4.

Figure 4: Decompiling the deobfuscated code
Figure 4: Decompiling the deobfuscated code

Monday, 10 July 2017

Introducing bytecode simplifier

Bytecode simplifier is a tool to deobfuscate PjOrion protected python scripts. It is a complete rewrite of my earlier tool PjOrion Deobfuscator. I have reimplemented the deobfuscation functionality from scratch and have used networkx specifically for this purpose. Using networkx made reasoning about the code much simpler.
The PjOrion version used is 1.3.2 (Filename: PjOrion_Uncompyle6_01.10.2016.zip)

The code is at https://github.com/extremecoders-re/bytecode_simplifier

A short tutorial can be found here: https://0xec.blogspot.com/2017/07/deobfuscating-pjorion-using-bytecode.html

Saturday, 1 April 2017

Remote debugging in IDA Pro by http tunnelling

IDA Pro provides remote debugging capability that allows us to debug a target binary residing on a different machine over the network. This feature is very useful in situations such as when we want to debug an executable for an arm device as installing IDA on it is not possible. IDA can remotely debug another binary in two ways - through a gdbserver or by the provided debugger servers (located in dbgsrv directory).

These debugging servers transport the debugger commands, messages and relevant data over a TCP/IP network through BSD sockets. So far so good, but what if the debugging server resided on a virtual host hosting multiple domain names? We cannot use sockets anymore.

A socket connection between two endpoints is characterized by a pair of socket addresses, one for each node. The socket address, in turn, comprises of the IP address and a port number. For an incoming socket connection, a server hosting multiple domains on the same IP address cannot decide which domain to actually forward the request based on socket address alone. Thus remote debugging using sockets is not possible. However, this is not entirely true as there are techniques such as port forwarding (aka virtual server) that can be used to reroute the incoming traffic to various private IPs based on a pre-decided table. Port forwarding capability is not available everywhere so we can ignore it for now. Instead, it would be much better if sockets supported connections based on domain names as described in this paper Name-based Virtual Hosting in TCP.

The Application Layer Protocol HTTP solves the virtual host problem by including the Host header in HTTP messages. It seems that if we can wrap the transport layer socket traffic in plain old HTTP messages our problem would be solved. The rest of the blog post describes this process in detail.

The problem

A few days ago, I was trying some CTF challenge involving an arm binary. The binary was loaded in IDA within a Windows XP VM. Debugging the binary would require a Linux box at the minimum with qemu-arm installed. Rather than powering up my ubuntu VM, I decided to debug it remotely on cloud9. Cloud9 is a sort of VPS that provide Docker Ubuntu containers called as workspaces where we can run whatever we want. The arm binary can be debugged using qemu as follows:

$ qemu-arm-static -g 8081 ./challenge

We are using the user mode emulation capability of qemu to run non-native elf binaries. The port on which qemu listens for incoming gdb connections is specified by the -g flag and is 8081 in this case. We have specified port 8081 as it is one of the few ports cloud9 allows incoming connections. Now, if we try to attach to the process in IDA using remote gdb debugger as the debugger type configured as shown in Figure 1, IDA fails.
Remote debugger configuration
Figure 1: Remote debugger configuration (ignore the paths)
This is expected as the container on which the debuggee is running is on a virtual host where multiple containers have same IP addresses with different domain names. A socket connects by IP addresses and not by domain names thus it is not possible to connect to our container using sockets. We can get a clearer picture using netcat.

Let us create a netcat server listening on port 8081 as shown in Figure 2.
Netcat server listening on port 8081
Figure 2: Netcat server listening on port 8081
We can try to connect to this server from our Windows XP VM as shown in Figure 3.

Trying to connect to our netcat server
Figure 3: Trying to connect to our netcat server

Unsurprisingly, this fails too for the same reason.

The workaround

We have seen that socket connection is be made using IP addresses. However, if we connect using HTTP we can use domain names. This is possible because of the Host header as mentioned earlier, Let's test this concept.

We create a netcat server listening on port 8081 which replies with a HTTP "HELLO WORLD" message. This is done as shown in Figure 4

Netcat server replying with http message
Figure 4: Netcat server replying with HTTP message
For the client part in the Windows XP box, we use curl instead of netcat as shown in Figure 5. We choose curl over netcat as we are performing an HTTP transaction and not a socket connection.

Figure 5: Using curl to connect to the netcat server

The connection succeeds and we get the HELLO WORLD response. The netcat server running on cloud9 also displays the success status as in Figure 6.
Netcat server replied to the request
Figure 6: Netcat server replied to the request
From the above experiments, it is clear that we must use HTTP in order to establish a connection to the remote container running on a virtual host. Similarly, if we intend to debug remotely an app using IDA we must also use HTTP instead of sockets.

Using HTTP Tunnelling

We have seen that connection using HTTP is only possible. If we want to use sockets, it must be wrapped in HTTP. This technique of encapsulating one protocol over HTTP is called HTTP tunnelling. Wikipedia explains this best. Primarily, HTTP tunnels are used to bypass restrictive network environments like firewalls where only connections to well-known ports are permitted. We can reuse the tunnelling technique for debugging in IDA as well.

A Http tunnel application consists of two parts - server and client both communicating over HTTP. Before using a Http tunnel the situation was like Figure 7.

Socket connection
Figure 7: Socket connection

After using Http tunnel, the situation would look like Figure 8.

Http tunnelling
Figure 8: Http tunnelling
The debugger and tunnel client reside on the same machine though they are depicted as separate computers. Similarly, the tunnel client and the debuggee reside on the same cloud9 container. The tunnel client-server pair encapsulates the socket in an Http connection. Using this mechanism we can remotely debug using IDA.

Searching for a Http tunnelling application, I came across Chisel. It is open-source and written in Go. Compiling this from source is simple:

$ git clone https://github.com/jpillora/chisel.git
$ cd chisel
$ go build -o chisel # for compiling native linux binaries
$ GOOS=windows GOARCH=386 go build -o chisel.exe # cross compiling for windows x86

Remote configuration


We run the chisel server on cloud9 listening on port 8081 on all network interfaces:

$ ./chisel server --port=8081
2017/03/31 19:57:03 server: Fingerprint 07:4e:00:e4:82:9b:76:3a:3a:70:55:30:2e:1d:c2:82
2017/03/31 19:57:03 server: Listening on 8081...

qemu runs with the gdbserver listening on port 23946 for incoming gdb connections from IDA.

$ qemu-arm-static -g 23946 ./challenge

The connection between chisel server and qemu is through sockets. The debugger traffic wrapped in Http will be passed to chisel server at port 8081, chisel will extract the payload of the message and pass it to qemu at port 23946 over a socket.

Local configuration

In our Windows XP box we run chisel in client mode with the following command line:

C:\>chisel client qemu-extremecoders-re.c9users.io:8081 1234:23946
2017/04/01 01:28:36 client: Connecting to ws://qemu-extremecoders-re.c9users.io:8081
2017/04/01 01:28:38 client: Fingerprint 07:4e:00:e4:82:9b:76:3a:3a:70:55:30:2e:1d:c2:82
2017/04/01 01:28:39 client: Connected (Latency 203.125ms)

The remote Url on which the chisel server listens (qemu-extremecoders-re.c9users.io:8081) is specified along with the port.

The second set of port (1234:23946) separated by a colon specifies the port mapping from local to remote. It means incoming traffic to chisel client at local port 1234 will be forwarded to the chisel server which will, in turn, relay the traffic over a socket to port 23946 where qemu is listening.

Finally, we need to configure IDA to use the local chisel client as the remote host. This is done as per Figure 9.
 IDA remote debugger configuration
Figure 9: IDA remote debugger configuration
The hostname is specified as 127.0.0.1 and the port as 1234. This is the address where the chisel client is accepting socket connections.

At this point, if we try to attach to the remote process, it succeeds with the following message as in Figure 10.
Attach successful
Figure 10: Attach successful

Mission accomplished!

Final words

Http tunnelling is a very effective technique in scenarios where only Http connections are allowed or possible. In this case of remote debugging, we used http tunnelling since normal socket connections cannot be established. With this we come to the end of this post. Hope you find this useful. Ciao!

Sunday, 26 March 2017

67,000 cuts with python-pefile

EasyCTF featured an interesting reversing engineering challenge. The problem statement is shown in Figure 1.
Figure 1: Problem statement
A file 67k.zip was provided containing 67,085 PE files numbered from 00000.exe to 1060c.exe as shown in Figure 2.

67k files to reverse
Figure 2: 67k files to reverse!

The task was to reverse engineer each of them and combine their solutions to get the flag. All of the files were exactly 2048 bytes in size as shown in Figure 3.

2048 all the way
Figure 3: 2048 all the way
Let's analyze one of the files, say the first one 00000.exe in IDA. The graph view is simple as in Figure 4.
Graph view
Figure 4: Graph view of 00000.exe
The program accepts one integer input through scanf. This is compared with another number generated by a simple operation like sub on two hard-coded integers stored in register eax and ecx. If they match, we go to the green basic block on the left. It does another calculation (sar - Shift Arithmetic Right at 402042) and finally prints this calculated value along with the success message at 40204F. This general pattern is followed by all of the 67,085 files with minor changes as enumerated below:

  • The imagebase and the entrypoint of the PE vary with each file.
  • The operation on the two hardcoded integers can be any of addition, subtraction or xor.
  • The address of the function (op_sub in the example) performing the operation varies.
  • The address of the hard coded integer (dword_403000 in the example) varies.
  • The amount of shift stored in byte_403007 also varies.

Obviously, reversing 67k files by hand is not possible and requires automation. For this task, I choose the pefile module by Ero Carrera. First, we need to get the offsets of the individual instructions from the Entry point. We can do this from OllyDbg as in the following listing. The offsets are in the left most column.

<Modul>/$  68 5E304000       push 0040305E                            ; /s = "Launch codes?"
$+5   >|.  FF15 44104000     call dword ptr [<&msvcrt.puts>]          ; \puts
$+B   >|.  58                pop eax
$+C   >|.  68 6C304000       push 0040306C
$+11  >|.  68 04304000       push 00403004                            ; /format = "%d"
$+16  >|.  FF15 48104000     call dword ptr [<&msvcrt.scanf>]         ; \scanf
$+1C  >|.  83C4 08           add esp,8
$+1F  >|.  A1 00304000       mov eax,dword ptr [403000]
$+24  >|.  B9 EDA7A8A1       mov ecx,A1A8A7ED
$+29  >|.  E8 CFFFFFFF       call <op_sub>
$+2E  >|.  3B05 6C304000     cmp eax,dword ptr [40306C]
$+34  >|.  75 1E             jnz short 0040205A
$+36  >|.  8A0D 07304000     mov cl,byte ptr [403007]
$+3C  >|.  D3F8              sar eax,cl
$+3E  >|.  25 FF000000       and eax,0FF
$+43  >|.  50                push eax                                 ; /<%c>
$+44  >|.  68 34304000       push 00403034                            ; |format = "Wow you got it. Here is the result: (%c)"
$+49  >|.  FF15 4C104000     call dword ptr [<&msvcrt.printf>]        ; \printf
$+4F  >|.  83C4 08           add esp,8
$+52  >|.  EB 0C             jmp short 00402066
$+54  >|>  68 08304000       push 00403008                            ; /s = "I think my dog figured this out before you."
$+59  >|.  FF15 44104000     call dword ptr [<&msvcrt.puts>]          ; \puts
$+5F  >|.  58                pop eax
$+60  >\>  C3                ret

The complete script is provided below.
import zipfile
import struct
import pefile
import cStringIO


def rshift(val, n):
    """
    Implements arithmetic right shift on 32 bits
    """
    return (val % 0x100000000) >> n

def process(buf):
    # Load the Pe file
    pe = pefile.PE(data=buf, fast_load=True)

    # RVA of Entry Point
    ep = pe.OPTIONAL_HEADER.AddressOfEntryPoint

    imagebase = pe.OPTIONAL_HEADER.ImageBase

    # $+1F  >|.  A1 00304000       mov eax,dword ptr [403000]
    # $+24  >|.  B9 EDA7A8A1       mov ecx,A1A8A7ED
    eax = pe.get_dword_at_rva(pe.get_dword_at_rva(ep + 0x1f + 1) - imagebase)
    ecx = pe.get_dword_at_rva(ep + 0x24 + 1)

    # $+29  >|.  E8 CFFFFFFF       call <op_sub>
    fn_offs = struct.unpack('<i', pe.get_data(ep + 0x29 + 1, length = 4))[0]

    # function rva = instruction address + length + func offset from imagebase
    fn_rva = 0x29 + 5 + fn_offs 

    # Get the first byte of the function (op_sub)
    func_byte = ord(pe.get_data(rva = ep+fn_rva, length=1))

    # Perform the operation based on the function byte

    # op_xor
    # 31C8            xor eax,ecx
    # C3              ret
    if func_byte == 0x31:
        eax ^= ecx

    # op_add
    # 01C8            add eax,ecx
    # C3              ret
    elif func_byte == 0x1:
        eax += ecx

    # op_sub
    # 29C8            sub eax,ecx
    # C3              ret
    elif func_byte == 0x29:
        eax -= ecx

    else:
        raise 'Error'

    # $+36  >|.  8A0D 07304000     mov cl,byte ptr [403007]
    # $+3C  >|.  D3F8              sar eax,cl
    # $+3E  >|.  25 FF000000       and eax,0FF
    cl = ord(pe.get_data(pe.get_dword_at_rva(ep+0x36+2)-imagebase, 1))

    return chr(rshift(eax, cl) & 0xFF)

if __name__ == '__main__':
    output = cStringIO.StringIO()
    with zipfile.ZipFile('67k.zip') as f:
        for idx in xrange(67085):
            fname = format(idx, 'x').zfill(5) + '.exe'
            buf = f.read(fname)
            output.write(process(buf))
            # Fast divisiblity check by 1024, 2^10 (last 10 bits must be zero)
            if idx & 0x3FF == 0: 
                print 'Completed', idx

    open('output.txt', 'w').write(output.getvalue())
    print 'Done!!'
Instead of unpacking 67,085 files to the hard drive and fragmenting it in the process, I have used the zipfile module to access the files within the archive. However, zipfile throws an error on opening the archive and must be modified slightly as described in this Stack Overflow answer.

We access the instructions by using the offsets from the entry point. The address of the operation function and the values of the hard-coded integers, shift amount are also obtained similarly. We discern the type of operation performed by examining its first byte. With this information, we can find the correct output.

Running the script on stock Python 2.7 takes close to 15 minutes. With PyPy, this is reduced to 2 minutes. We get a 66 kb output consisting of obfuscated javascript as shown in Figure 5.
Obfuscated javascript output
Figure 5: Obfuscated javascript output
Running the obfuscated javascript on jsfiddle gives us the flag easyctf{wtf_67k_binaries?why_so_mean?} as also shown in Figure 6:
Figure 6: Finally we get the flag

Thursday, 16 March 2017

Hacking the CPython virtual machine to support bytecode debugging


As you may know, Python is an interpreted programming language. By Python, I am referring to the standard implementation i.e CPython. The implication of being interpreted means that python code is never directly executed by the processor. The python compiler converts the source code into an intermediate representation called as the bytecode. The bytecode consists of instructions which at runtime are interpreted by the CPython virtual machine. For knowing more about the nitty-gritty details refer to ceval.c.

Unfortunately, the standard python implementation does not provide a way to debug the bytecode when they are being executed on the virtual machine. You may question, why is that even needed as I can already debug python source code using pdb and similar tools. Also, gdb 7 and above support debugging the virtual machine itself so bytecode debugging may seem unnecessary.

However, that is only one side of the coin. Pdb can be used for debugging only when the source code is available. Gdb no doubt can debug without the source as we are dealing directly with the virtual machine but it is too low level for our tasks. This is akin to finding bugs in your C code by using an In-Circuit Emulator on the processor. Sure, you would find bugs if you have the time and patience but it is unusable for the most of us. What we need, is something in between, one which can not only debug without source but also is not too low-level and can understand the python specific implementation details. Further, it would be an icing on the cake if this system can be implemented directly in python code.

Implementation details of a source code debugger

Firstly, we need to know how a source code debugger is implemented with respect to Python. The defacto python debugger is the pdb module. This is basically a derived class from bdb. Pdb provides a command line user interface and is a wrapper around bdp. Now, both pdb and bdp are coded in python. The main debugging harness in CPython is implemented right within the sys module. 

Among the multifarious utility functions in the sys module, settrace allows us to provide a callback function which just as its name suggest can trace code execution. Python will call this function in response to various events like when a function is called, a function is about to return, an exception is generated or when a new line of code is about to be executed. Refer to the documentation of settrace for knowing about the specifics. 

However, there are a couple of gotchas. Unlike a physical processor, the CPython virtual machine has no concept of breakpoints. There is no such instruction like an INT 3 on x86 or BKPT on ARM to automatically switch the processor to debug state. Instead, the breakpoint mechanism must be implemented in the trace callback function. The trace function will be called whenever a new line of code is about to be executed. We need to check whether the user has requested a break on this line and if so yield control. This mechanism is not without its downside. As the callback will be invoked for every line, and for every other important event, execution speed will be severely reduced. To speed things up, this may be implemented in C as an extension module like cpdb.

So far so good, and it seems line tracing is just the functionality we require, however, this works only at a source code level. The lowest granularity on which tracing works is at the line level. and not at the instruction level as we require.

How does line tracing work?


Python code objects have a special member called co_lnotab. also known as the line number table. It contains a series of unsigned bytes wrapped up in a string. This is used to map bytecode offsets back into the source code line from where the particular instruction originated.

When the CPython virtual machine interprets the source code, after execution of each instruction it checks whether the current bytecode offset is the start point of some source code line, if so; it calls the trace function. An example trace function taken from the bdb module is shown below.
def trace_dispatch(self, frame, event, arg):
    if self.quitting:
        return # None
    if event == 'line':
        return self.dispatch_line(frame)
    if event == 'call':
        return self.dispatch_call(frame, arg)
    if event == 'return':
        return self.dispatch_return(frame, arg)
    if event == 'exception':
        return self.dispatch_exception(frame, arg)
    if event == 'c_call':
        return self.trace_dispatch
    if event == 'c_exception':
        return self.trace_dispatch
    if event == 'c_return':
        return self.trace_dispatch
    print 'bdb.Bdb.dispatch: unknown debugging event:', repr(event)
    return self.trace_dispatch
The trace function is provided with the currently executing frame as an argument. The frame is a data structure that encapsulates the context under which a code object is executing. We can query the frame using the inspect module. We can change the currently executing line by changing f_lineno of the frame object. Similarly, we can modify variables by using the eval function in the context of the globals and locals obtained from the frame.

Bytecode Tracing Techniques

Listed below are some existing techniques for tracing python bytecode execution.

Extending co_lnotab

We have seen co_lnotab, the line number table is used for determining when to call the trace function. Ned Batchelder (2008) showed that it is possible to modify the line number table to include an entry for each instruction offset in the bytecode. To the Python VM, this implies that every instruction corresponds to a different line of source, and hence it calls the trace function for every instruction executed. This technique is very easy to implement and requires no modification to python. We only need to alter the line number table for each code object to include an entry for each instruction. The downside of this approach is that it increases the pyc file size, and more so if the bytecode is obfuscated when we have no idea which bytes are instruction and which are junk. To be on the safer side, we can add an entry for each byte no matter if it is a real instruction or a junk byte.

Compiling python with LLTRACE

An undocumented way to trace bytecode execution is to compile python from source with the LLTRACE flag enabled. At execution time, python prints every instruction it executes on the console. This method is not without its flaws. Printing every executed instruction on the console is an expensive operation slowing down execution speed. Further, we have no control over the execution, i.e. we cannot modify the operation of the code in any way and it is not possible to toggle off this feature when we do not need it.

Introducing a new opcode

Yet another way to implement tracing is to introduce a new opcode altogether (Rouault, 2015). This is a complicated process and requires a lot of modifications to python. The entire process with all its gory details is described on this page. The gist of the approach is that we create a new opcode which Roualt (2015) calls as DEBUG_OP. Whenever Python VM encounters this opcode, it calls a previously user supplied function. passing the execution context consisting of the Frame and the evaluation stack as the arguments.

Undoubtedly, this method is superior to the pre-existing methods, although it requires a lot of changes in the implementation of python. However, the main drawback of this approach is that it requires to modify the instruction stream and slip a DEBUG_OP opcode in between. This is feasible for normal bytecode generated by python but definitely not for the ones which are obfuscated. When the instructions are obfuscated, it is not possible to insert DEBUG_OP opcode in advance as we cannot differentiate between normal instructions and junk instructions.

The proposed method

Keeping note of the limitations of the above techniques, our proposed method must overcome these. Specifically, it must be resistant to obfuscation and should not require any changes to the bytecode itself. It would be ideal if we could reuse or extend existing functionality to support bytecode tracing and debugging.

As said before, the Python VM consults co_lnotab, the line number table before execution of each instruction to determine when to call the trace function. It looks like we can somehow modify this to call our tracing function right before execution of the individual instructions without checking the line number table. This is the approach we will take.

The function responsible for calling the tracing function is maybe_call_line_trace at line #4054 within ceval.c.
/* See Objects/lnotab_notes.txt for a description of how tracing works. */
static int
maybe_call_line_trace(Py_tracefunc func, PyObject *obj,
                      PyFrameObject *frame, int *instr_lb, int *instr_ub,
                      int *instr_prev)
{
    int result = 0;
    int line = frame->f_lineno;

    /* If the last instruction executed isn't in the current
       instruction window, reset the window.
    */
    if (frame->f_lasti < *instr_lb || frame->f_lasti >= *instr_ub) {
        PyAddrPair bounds;
        line = _PyCode_CheckLineNumber(frame->f_code, frame->f_lasti,
                                       &bounds);
        *instr_lb = bounds.ap_lower;
        *instr_ub = bounds.ap_upper;
    }
    /* If the last instruction falls at the start of a line or if
       it represents a jump backwards, update the frame's line
       number and call the trace function. */
    if (frame->f_lasti == *instr_lb || frame->f_lasti < *instr_prev) {
        frame->f_lineno = line;
        result = call_trace(func, obj, frame, PyTrace_LINE, Py_None);
    }
    *instr_prev = frame->f_lasti;
    return result;
}

Those If statements are mostly checking whether the current bytecode instruction maps to the beginning of some line. We can simply remove them to make it call our trace function per executed instruction than per source line.
static int
maybe_call_line_trace(Py_tracefunc func, PyObject *obj,
                      PyFrameObject *frame, int *instr_lb, int *instr_ub,
                      int *instr_prev)
{
    int result = 0;
 result = call_trace(func, obj, frame, PyTrace_LINE, Py_None);
    *instr_prev = frame->f_lasti;
    return result;
}

After building Python from source with those teeny-tiny changes in-place, we have implemented an execution tracer re-using the existing settrace functionality. We now need to code the callback function which will be called by settrace. This can be realized either in Python or C as an extension (like cpdb), but we choose the former for ease of development.

The Tracer


The code of the tracer is listed below and can also be found on GitHub at https://github.com/extremecoders-re/bytecode_tracer
import sys
import dis
import marshal
import argparse

tracefile = None
options = None

# List of valid python opcodes
valid_opcodes = dis.opmap.values()

def trace(frame, event, arg):
    global tracefile, valid_opcodes, options
    if event == 'line':
        # Get the code object
        co_object = frame.f_code

        # Retrieve the name of the associated code object
        co_name = co_object.co_name

        if options.name is None or co_name == options.name:
            # Get the code bytes
            co_bytes = co_object.co_code

            # f_lasti is the offset of the last bytecode instruction executed
            # w.r.t the current code object
            # For the very first instruction this is set to -1
            ins_offs = frame.f_lasti

            if ins_offs >= 0:
                opcode = ord(co_bytes[ins_offs])

                # Check if it is a valid opcode
                if opcode in valid_opcodes:
                    if opcode >= dis.HAVE_ARGUMENT:
                        # Fetch the operand
                        operand = arg = ord(co_bytes[ins_offs+1]) | (ord(co_bytes[ins_offs+2]) << 8)

                        # Resolve instriction arguments if specified
                        if options.resolve:
                            try:
                                if opcode in dis.hasconst:
                                    operand = co_object.co_consts[arg]
                                elif opcode in dis

For demonstrating the usage I have chosen the following piece of code taken from programiz.

# Python program to find the factorial of a number using recursion

def recur_factorial(n):
   """Function to return the factorial
   of a number using recursion"""
   if n == 1:
       return n
   else:
       return n*recur_factorial(n-1)

# Change this value for a different result
num = 7

# check is the number is negative
if num < 0:
   print("Sorry, factorial does not exist for negative numbers")
elif num == 0:
   print("The factorial of 0 is 1")
else:
   print("The factorial of",num,"is",recur_factorial(num))

Suppose, we want to trace the execution of the recur_factorial function. We can do so, by running the following:

$ python tracer.py -t=only -n=recur_factorial -r factorial.pyc trace.txt

We are tracing the execution of only those code objects having a name of recur_factorial.
The -r flag means to resolve the operands of instructions. Instructions in python can take an argument. For some instructions like LOAD_CONST, the argument is an integer specifying the index of an item within the co_consts table which will be pushed on the evaluation stack. If resolution (-r flag) is enabled, the item will be written to the trace instead of the integer argument.

The input file name is factorial.pyc and the trace file name is trace.txt. Running this we get the execution trace like the following
recur_factorial> 0 LOAD_FAST (n)
recur_factorial> 3 LOAD_CONST (1)
recur_factorial> 6 COMPARE_OP (==)
recur_factorial> 16 LOAD_FAST (n)
recur_factorial> 19 LOAD_GLOBAL (recur_factorial)
recur_factorial> 22 LOAD_FAST (n)
recur_factorial> 25 LOAD_CONST (1)
recur_factorial> 28 BINARY_SUBTRACT
recur_factorial> 29 CALL_FUNCTION (1)
recur_factorial> 0 LOAD_FAST (n)
recur_factorial> 3 LOAD_CONST (1)
recur_factorial> 6 COMPARE_OP (==)
recur_factorial> 16 LOAD_FAST (n)
recur_factorial> 19 LOAD_GLOBAL (recur_factorial)
recur_factorial> 22 LOAD_FAST (n)
recur_factorial> 25 LOAD_CONST (1)
recur_factorial> 28 BINARY_SUBTRACT
recur_factorial> 29 CALL_FUNCTION (1)
recur_factorial> 0 LOAD_FAST (n)
recur_factorial> 3 LOAD_CONST (1)
recur_factorial> 6 COMPARE_OP (==)
recur_factorial> 16 LOAD_FAST (n)
recur_factorial> 19 LOAD_GLOBAL (recur_factorial)
recur_factorial> 22 LOAD_FAST (n)
recur_factorial> 25 LOAD_CONST (1)
recur_factorial> 28 BINARY_SUBTRACT
recur_factorial> 29 CALL_FUNCTION (1)
recur_factorial> 0 LOAD_FAST (n)
recur_factorial> 3 LOAD_CONST (1)
recur_factorial> 6 COMPARE_OP (==)
recur_factorial> 16 LOAD_FAST (n)
recur_factorial> 19 LOAD_GLOBAL (recur_factorial)
recur_factorial> 22 LOAD_FAST (n)
recur_factorial> 25 LOAD_CONST (1)
recur_factorial> 28 BINARY_SUBTRACT
recur_factorial> 29 CALL_FUNCTION (1)
recur_factorial> 0 LOAD_FAST (n)
recur_factorial> 3 LOAD_CONST (1)
recur_factorial> 6 COMPARE_OP (==)
recur_factorial> 16 LOAD_FAST (n)
recur_factorial> 19 LOAD_GLOBAL (recur_factorial)
recur_factorial> 22 LOAD_FAST (n)
recur_factorial> 25 LOAD_CONST (1)
recur_factorial> 28 BINARY_SUBTRACT
recur_factorial> 29 CALL_FUNCTION (1)
recur_factorial> 0 LOAD_FAST (n)
recur_factorial> 3 LOAD_CONST (1)
recur_factorial> 6 COMPARE_OP (==)
recur_factorial> 16 LOAD_FAST (n)
recur_factorial> 19 LOAD_GLOBAL (recur_factorial)
recur_factorial> 22 LOAD_FAST (n)
recur_factorial> 25 LOAD_CONST (1)
recur_factorial> 28 BINARY_SUBTRACT
recur_factorial> 29 CALL_FUNCTION (1)
recur_factorial> 0 LOAD_FAST (n)
recur_factorial> 3 LOAD_CONST (1)
recur_factorial> 6 COMPARE_OP (==)
recur_factorial> 12 LOAD_FAST (n)
recur_factorial> 15 RETURN_VALUE
recur_factorial> 32 BINARY_MULTIPLY
recur_factorial> 33 RETURN_VALUE
recur_factorial> 32 BINARY_MULTIPLY
recur_factorial> 33 RETURN_VALUE
recur_factorial> 32 BINARY_MULTIPLY
recur_factorial> 33 RETURN_VALUE
recur_factorial> 32 BINARY_MULTIPLY
recur_factorial> 33 RETURN_VALUE
recur_factorial> 32 BINARY_MULTIPLY
recur_factorial> 33 RETURN_VALUE
recur_factorial> 32 BINARY_MULTIPLY
recur_factorial> 33 RETURN_VALUE

That's pretty cool!. Now we now the exact opcodes that are executing. Tracing obfuscated bytecode is no longer a problem.

Extending tracing to full-fledged debugging

The tracer developed does not have advanced debugging capabilities. For instance, we cannot interact with the operand stack, tamper the values stored, modify the opcodes dynamically at the run time etc. We do have access to the frame object but the evaluation stack is not accessible from python. However, everything is accessible to a C extension. We can develop such a C extension which when given a frame object can allow python code to interact with the objects stored on the operand stack.

This will be the topic for another blog post. I also intend to show, how we can use such an advanced tracer to unpack & deobfuscate the layers of a PjOrion protected python application.


References


Batchelder, N. (2008, April 11). Wicked hack: Python bytecode tracing. Retrieved March 15, 2017, from https://nedbatchelder.com/blog/200804/wicked_hack_python_bytecode_tracing.html

Rouault, C. (2015, May 7)Understanding Python execution from inside: A Python assembly tracer. Retrieved March 15, 2017, from http://web.archive.org/web/20160830181828/http://blog.hakril.net/articles/2-understanding-python-execution-tracer.html?