With the advent of anonymous online money transactions (read Bitcoin) ransomware has become a profitable business in the cybercrime industry. This combined with the Tor network hides the attackers identity. Further, low infosec literacy makes social engineering really easy. All you need to do is send an email with an attached word file for a failed FedEx delivery etc. The victim would download the attachment, run the word file, enable macros, all without thinking a bit.
Minutes later he/she would be staring at a screen that may look like this
|Fig 1: The Petya lock screen|
In panic, the hapless user may call his tech friend or google (seriously?) and be convinced that he/she has really lost all data. Now either he/she heed to the attacker's demands and pay the ransom or just forget it. Paying the ransom is however not too easy. There is no paypal, no credit card. The attackers accepts payments only in bitcoins and you also need to install the so called tor browser. This is the end of the road for many.
For us reversers, this is not the end. We try to find out if the user can have his/her data recovered WITHOUT paying the ransom. However this is not always possible particularly if the malware coders are experts. If not then we may have a way in just as was the case of the Petya ransomware which is the topic for today's blog post.
The sample on which the analysis has been done can be found here.
The malware is different & unique from typical ransomware as it does not encrypts file. Instead it encrypts the MFT (Master File Table) on NTFS volumes. In short, on NTFS the MFT is a table which contains information about each and every file on the partition. For small files, the file content may be stored entirely within the MFT. For larger files, it contains a pointer to the actual data.
Encrypting the MFT is advantageous in the sense that the operation is very fast. You do not need to recursively traverse the entire drive to find the files. The files are there on the disk but the system does not know where to find them. This is as good as having the individual files encrypted.
The downside of having the MFT encrypted is the malware will need to be low level & sophisticated. Since the system cannot boot the OS a custom bootloader has to be developed. The code has to be 16-bit running in real mode. It has to use the BIOS interrupt services to communicate with the user. This is not an easy task considering we are used to develop in 32/64 bit with memory protection, segmentation and other niceties by the OS. In real mode we are responsible for everything.
My favourite OS for reverse engineering tasks like this is old and trusty Windows XP. However for reasons unknown I could not get this sample running. Hence, I had to resort to a Windows 7 SP1 x86 VM. Running the sample leads to a BSOD after some seconds.
|Fig 2: The BSOD|
This BSOD is generated entirely from user mode by calling an undocumented API NtRaiseHardError. On the next boot a fake chkdsk scan starts to run.
|Fig 3: The malware is hiding its action behind the fake scan.|
In reality, the malware is doing its dirty work of encrypting the MFT. The chkdsk is just a decoy. When done we are presented with the redemption screen as in Fig 4 and then Fig 1. However it is not that scary as it looks :).
|Fig 4: Danger Ahead!|
Carving the mal-bootloader from disk
To perform static analysis we need the malicious bootloader. Since I have used vmware here, the easiest way would be to attach the vmware disk image (vmdk) to another virtual machine and use a tool like Active@ Disk Editor as in Fig 5. I have also developed a 010 editor template for parsing vmware disk images directly and can be used just in case.
|Fig 5: Active Disk Editor in action|
IDA & Bochs
For static analysis we will be using IDA (no surprises). For dynamic analysis we will be using the Bochs debugger. Although vmware can debug bios code & bootloaders by its gdb stub but it is quite a pain to use efficiently. Hence we will stick with bochs. IDA provides first class support for bochs. Further running bochs is lot snappier than powering a full fledged vmware vm.
You can get the bxrc file (bochs config file) here. We can now load the bxrc file in IDA and it automatically do the rest.
|Fig 6: The initial malware code|
The malware copies 32 sectors starting from sector 34 to 0x8000 and jumps to it as in Fig 6.
|Fig 7: To encrypt or not|
Among other things, it reads sector 54 to a buffer. If the first byte contains 0 it proceeds to encrypt. If not it displays the ransom screen as shown in Fig 7. Hence the first byte of sector 54 is used mainly as a flag to decide its further course of action.
Analyzing the decryption code
We will be focussing primarily on the decryption algorithm. After all we are more interested in getting our data back than figuring out how it got lost. The process is simple, it reads a string, checks it and if it is valid decrypts our data.
|Fig 8: Read & Check key|
It accepts a key of maximum length 73 characters, but only the first 16 of them are used. The characters which are accepted consists of 1-9, a-z, A-X. After this the 16 byte key is expanded to a 32 byte key by adding 122 and doubling consecutive characters respectively. This is shown in Fig 9.
|Fig 9: Expanding the key|
Next we reach the crux of the malware. It reads some data from sector 54 & 55 and passes them to the Crypt function. Using the 32 byte decryption key and an 8 byte Initialization Vector (nonce) from sector 54 it decrypts the 512 bytes of data in sector 55. If our key is correct, all byte in the decrypted data must equal all 0x37.
|Fig 10: Calling Crypt|
Finding the encryption algorithm
The encryption algorithm used is a variant of the Salsa stream cipher. I call this variant because properly implemented Salsa is quite secure. Well, how do we know this is Salsa? From magic constants, of course. See Fig 11.
|Fig 11: expand 32-byte k|
Searching for "expand 32-byte k" would directly lead you to Salsa. The exact code used in the malware can be found here. I am using the word exact in a broad sense. If it had been a ditto copy, we would have no chance of breaking it. The original Salsa implementation uses 32 bit (uint32_t) variables. This salsa implementation uses 16 bit variables for the same purpose borking it badly. Here is a snippet of the borked version. You can get the full version here. Compare this to the original version.
The primary reason for the mess up can be attributed to the fact that all of this is running in 16 bit real mode. So the authors decide to go easy and implement the exact same algorithm but with 16 bit variables.
Breaking the algorithm
We already have the entire algorithm in source code. We need to fire up our tools to go & break it. These days, my defacto tool for such analysis has mostly been angr. However angr failed to work in this case. This is expected as the framework is in a continuous state of development. Not spending time on finding why it failed, I decided to look at other options. I used KLEE. It did not fail but took a long time and never finished. Next, some wild cropped up and I decided to use fuzzing based approach. For this I used the AFL framework. No luck here too.
Lastly I decided to use the tried and tested Z3 constraint solver and it did not disappoint :). We already have the source, we just need to implement it in Z3. The code is as follows.
The program has to be provided with the 8 byte nonce from sector 54. and 64 bytes from sector 55 after xoring with 0x37. The remainder of the program is a literal transcription of the c source and hence not explained. Running the program we get our decryption key in a few milliseconds. Apply the decryption key & hope for the best.
|Fig 12: Mission accomplished!|