Tuesday, 21 November 2017

bnpy - A python architecture plugin for Binary Ninja

Recently I got a chance to try out Vector 35's Binary Ninja, and I must say the experience has been great so far. The good thing about binary ninja (binja henceforth) is its API, we can easily custom plugins for various purposes such as a disassembler for a foreign architecture. We can do the same in IDA, but developing processor plugins in IDA is not for the faint of heart. At the moment, binja is  entirely a static analysis tool but we do have plugins like binjatron that attempts to fill this void.

Playing with the binja API, I developed bnpy - a disassembler for python bytecode. In the binja terminology this is called as an Architecture plugin. At the moment it works for raw python bytecode, i.e. you must extract the instruction stream from a pyc file in order to use it.

In the near future, I plan to extend it so that it can disassemble a pyc (compiled python) file right out of the box. Right now, this is difficult due to certain limitations in the API. To understand this we need to know a bit more about the pyc format.

The pyc file is not a flat file format like a PE or ELF.  It is a nested format bearing a tree-like structure. A pyc file contains a single top-level code object. Among other things, a code object stores an array of constants used by the code. This array is called as co_consts. The constants can be integers, strings and even another nested code object. The code object also stores the bytecode instructions in a string named as co_code. At the moment, the bnpy plugin operates on this instruction string. To better describe the structure of pyc files we can refer to  the following image taken from kaitai struct.

Fig. 1: The structure of a pyc file
You can see, the code objects within a pyc file are nested. The function view in binja is flat and thus not suitable for displaying a tree structure. As of now, the plugin can be used on the raw bytecode stream. Steps for extracting the bytecode along with other directions can be found on the plugin page at GitHub.

https://github.com/extremecoders-re/bnpy

To conclude this short post, here is a GIF of the plugin in action.



Monday, 16 October 2017

Flare-On Challenge 2017 Writeup

Flare-on is an annual CTF style challenge organized by Fire-eye with a focus on reverse engineering. The contest falls into its fourth year this season. Taking part in these challenges gives us a nice opportunity to learn something new and this year was no exception. Overall, there were 12 challenges to complete. Official solution to the challenges has already been published at the FireEye blog. Hence instead of a detailed write-up, I will just cover the important parts.

#1 - Login.html

The first problem was as simple as it gets. There is an HTML file with a form. We need to provide a flag and check for its correctness.

Figure 1: Check thy flag
The code simply performs a ROT-13 of the input and compares it with another string. To get back the flag, re-apply ROT-13.

Figure 2: ROT-13 again
Flag: ClientSideLoginsAreEasy@flare-on.com

Tuesday, 11 July 2017

Deobfuscating PjOrion using bytecode simplifier

Bytecode simplifier is a tool to de-obfuscate PjOrion protected python scripts. This post is a short tutorial to show how to use this module to deobfuscate a protected python script.

I have used the sample code below to demonstrate its usage. This is a small program to calculate the factorial of a number.

# Python program to find the factorial of a number using recursion
 
def recur_factorial(n):
   """Function to return the factorial
   of a number using recursion"""
   if n == 1:
       return n
   else:
       return n*recur_factorial(n-1)
 
 
# take input from the user
num = int(input("Enter a number: "))
 
# check is the number is negative
if num < 0:
   print("Sorry, factorial does not exist for negative numbers")
elif num == 0:
   print("The factorial of 0 is 1")
else:
   print("The factorial of",num,"is",recur_factorial(num))

Monday, 10 July 2017

Introducing bytecode simplifier

Bytecode simplifier is a tool to deobfuscate PjOrion protected python scripts. It is a complete rewrite of my earlier tool PjOrion Deobfuscator. I have reimplemented the deobfuscation functionality from scratch and have used networkx specifically for this purpose. Using networkx made reasoning about the code much simpler.
The PjOrion version used is 1.3.2 (Filename: PjOrion_Uncompyle6_01.10.2016.zip)

The code is at https://github.com/extremecoders-re/bytecode_simplifier

A short tutorial can be found here: https://0xec.blogspot.com/2017/07/deobfuscating-pjorion-using-bytecode.html

Saturday, 1 April 2017

Remote debugging in IDA Pro by http tunnelling

IDA Pro provides remote debugging capability that allows us to debug a target binary residing on a different machine over the network. This feature is very useful in situations such as when we want to debug an executable for an arm device as installing IDA on it is not possible. IDA can remotely debug another binary in two ways - through a gdbserver or by the provided debugger servers (located in dbgsrv directory).

These debugging servers transport the debugger commands, messages and relevant data over a TCP/IP network through BSD sockets. So far so good, but what if the debugging server resided on a virtual host hosting multiple domain names? We cannot use sockets anymore.

A socket connection between two endpoints is characterized by a pair of socket addresses, one for each node. The socket address, in turn, comprises of the IP address and a port number. For an incoming socket connection, a server hosting multiple domains on the same IP address cannot decide which domain to actually forward the request based on socket address alone. Thus remote debugging using sockets is not possible. However, this is not entirely true as there are techniques such as port forwarding (aka virtual server) that can be used to reroute the incoming traffic to various private IPs based on a pre-decided table. Port forwarding capability is not available everywhere so we can ignore it for now. Instead, it would be much better if sockets supported connections based on domain names as described in this paper Name-based Virtual Hosting in TCP.

The Application Layer Protocol HTTP solves the virtual host problem by including the Host header in HTTP messages. It seems that if we can wrap the transport layer socket traffic in plain old HTTP messages our problem would be solved. The rest of the blog post describes this process in detail.

Sunday, 26 March 2017

67,000 cuts with python-pefile

EasyCTF featured an interesting reversing engineering challenge. The problem statement is shown in Figure 1.
Figure 1: Problem statement
A file 67k.zip was provided containing 67,085 PE files numbered from 00000.exe to 1060c.exe as shown in Figure 2.

67k files to reverse
Figure 2: 67k files to reverse!


Thursday, 16 March 2017

Hacking the CPython virtual machine to support bytecode debugging


As you may know, Python is an interpreted programming language. By Python, I am referring to the standard implementation i.e CPython. The implication of being interpreted means that python code is never directly executed by the processor. The python compiler converts the source code into an intermediate representation called as the bytecode. The bytecode consists of instructions which at runtime are interpreted by the CPython virtual machine. For knowing more about the nitty-gritty details refer to ceval.c.

Unfortunately, the standard python implementation does not provide a way to debug the bytecode when they are being executed on the virtual machine. You may question, why is that even needed as I can already debug python source code using pdb and similar tools. Also, gdb 7 and above support debugging the virtual machine itself so bytecode debugging may seem unnecessary.

However, that is only one side of the coin. Pdb can be used for debugging only when the source code is available. Gdb no doubt can debug without the source as we are dealing directly with the virtual machine but it is too low level for our tasks. This is akin to finding bugs in your C code by using an In-Circuit Emulator on the processor. Sure, you would find bugs if you have the time and patience but it is unusable for the most of us. What we need, is something in between, one which can not only debug without source but also is not too low-level and can understand the python specific implementation details. Further, it would be an icing on the cake if this system can be implemented directly in python code.

Wednesday, 15 February 2017

Extracting encrypted pyinstaller executables

It has been more than a quarter since the last post, and in the meantime, I was very busy and did not have the time to write a proper post. The good news is at the moment, I am comparatively free and can put in a quick post. 

As said earlier, PyInstaller provides an option to encrypt the embedded files within the executable. This feature can be used by supplying an argument --key=key-string while generating the executable. 

Detecting encrypted pyinstaller executables is simple. If  pyinstxtractor is used, it would indicate this as shown in Figure 1.

Trying to extract encrypted pyinstaller archive
Figure 1: Trying to extract encrypted pyinstaller archive

Wednesday, 9 November 2016

Flare-on Challenge 2016 Write-up


The Flare-on challenge is an annual CTF style challenge with a focus on reverse engineering. Official solutions have already been published, besides that there are other writeups available too, hence I will just skim through the parts.

Challenge #1

The first was simple. This is base64 encoding with a custom charset. This online tool does the job.
Flag: sh00ting_phish_in_a_barrel@flare-on.com

Fig 1: Challenge 1

Monday, 7 November 2016

Hack the Vote 2016 CTF - APTeaser writeup


Just for fun I decided to have a go at the Hack the Vote 2016 CTF, particularly the reversing challenges on Windows. There were two of them APTeaser & Trumpervisor. I managed to solve the first. I did try the second but it involved reversing a Win 10 kernel driver implementing a hypervisor using the Intel Virtualization Extensions (VT-x). Anyway, here is a somewhat detailed writeup for the first.

Initial Analysis

The provided file is a pcapng. Opening it in fiddler, reveals an interesting http request for a supposed pdf file on the domain important.documents.trustme, but as indicated from the Content-Type the response is actually an executable.

Serving an executable when all I want is a pdf
Fig 1: Serving an executable when all I want is a pdf