Monday 20 November 2017

bnpy - A python architecture plugin for Binary Ninja

Recently I got a chance to try out Vector 35's Binary Ninja, and I must say the experience has been great so far. The good thing about binary ninja (binja henceforth) is its API, we can easily custom plugins for various purposes such as a disassembler for a foreign architecture. We can do the same in IDA, but developing processor plugins in IDA is not for the faint of heart. At the moment, binja is  entirely a static analysis tool but we do have plugins like binjatron that attempts to fill this void.

Playing with the binja API, I developed bnpy - a disassembler for python bytecode. In the binja terminology this is called as an Architecture plugin. At the moment it works for raw python bytecode, i.e. you must extract the instruction stream from a pyc file in order to use it.

In the near future, I plan to extend it so that it can disassemble a pyc (compiled python) file right out of the box. Right now, this is difficult due to certain limitations in the API. To understand this we need to know a bit more about the pyc format.

The pyc file is not a flat file format like a PE or ELF.  It is a nested format bearing a tree-like structure. A pyc file contains a single top-level code object. Among other things, a code object stores an array of constants used by the code. This array is called as co_consts. The constants can be integers, strings and even another nested code object. The code object also stores the bytecode instructions in a string named as co_code. At the moment, the bnpy plugin operates on this instruction string. To better describe the structure of pyc files we can refer to  the following image taken from kaitai struct.

Fig. 1: The structure of a pyc file
You can see, the code objects within a pyc file are nested. The function view in binja is flat and thus not suitable for displaying a tree structure. As of now, the plugin can be used on the raw bytecode stream. Steps for extracting the bytecode along with other directions can be found on the plugin page at GitHub.

To conclude this short post, here is a GIF of the plugin in action.

1 comment:

  1. Hi. How to you run this code?
    I am getting error for sample code from wrapper_3.pyc

    code_object = marshal.load(pyc_file)
    ValueError: bad marshal data (unknown type code)

    can add your example file example.pyc for bnpy?