Friday, 31 March 2017

Remote debugging in IDA Pro by http tunnelling

IDA Pro provides remote debugging capability that allows us to debug a target binary residing on a different machine over the network. This feature is very useful in situations such as when we want to debug an executable for an arm device as installing IDA on it is not possible. IDA can remotely debug another binary in two ways - through a gdbserver or by the provided debugger servers (located in dbgsrv directory).

These debugging servers transport the debugger commands, messages and relevant data over a TCP/IP network through BSD sockets. So far so good, but what if the debugging server resided on a virtual host hosting multiple domain names? We cannot use sockets anymore.

A socket connection between two endpoints is characterized by a pair of socket addresses, one for each node. The socket address, in turn, comprises of the IP address and a port number. For an incoming socket connection, a server hosting multiple domains on the same IP address cannot decide which domain to actually forward the request based on socket address alone. Thus remote debugging using sockets is not possible. However, this is not entirely true as there are techniques such as port forwarding (aka virtual server) that can be used to reroute the incoming traffic to various private IPs based on a pre-decided table. Port forwarding capability is not available everywhere so we can ignore it for now. Instead, it would be much better if sockets supported connections based on domain names as described in this paper Name-based Virtual Hosting in TCP.

The Application Layer Protocol HTTP solves the virtual host problem by including the Host header in HTTP messages. It seems that if we can wrap the transport layer socket traffic in plain old HTTP messages our problem would be solved. The rest of the blog post describes this process in detail.

Sunday, 26 March 2017

67,000 cuts with python-pefile

EasyCTF featured an interesting reversing engineering challenge. The problem statement is shown in Figure 1.
Figure 1: Problem statement
A file 67k.zip was provided containing 67,085 PE files numbered from 00000.exe to 1060c.exe as shown in Figure 2.

67k files to reverse
Figure 2: 67k files to reverse!


Thursday, 16 March 2017

Hacking the CPython virtual machine to support bytecode debugging


As you may know, Python is an interpreted programming language. By Python, I am referring to the standard implementation i.e CPython. The implication of being interpreted means that python code is never directly executed by the processor. The python compiler converts the source code into an intermediate representation called as the bytecode. The bytecode consists of instructions which at runtime are interpreted by the CPython virtual machine. For knowing more about the nitty-gritty details refer to ceval.c.

Unfortunately, the standard python implementation does not provide a way to debug the bytecode when they are being executed on the virtual machine. You may question, why is that even needed as I can already debug python source code using pdb and similar tools. Also, gdb 7 and above support debugging the virtual machine itself so bytecode debugging may seem unnecessary.

However, that is only one side of the coin. Pdb can be used for debugging only when the source code is available. Gdb no doubt can debug without the source as we are dealing directly with the virtual machine but it is too low level for our tasks. This is akin to finding bugs in your C code by using an In-Circuit Emulator on the processor. Sure, you would find bugs if you have the time and patience but it is unusable for the most of us. What we need, is something in between, one which can not only debug without source but also is not too low-level and can understand the python specific implementation details. Further, it would be an icing on the cake if this system can be implemented directly in python code.