vmquack was a CISC VM reversing challenge I wrote for corCTF 2021 loosely based on x86_64 as I have been expanding my skillset beyond just pwn recently. It utilized a few anti-debugging tricks inspired by this pdf, and just for fun, I decided to play around with Binary Ninja's API to write a disassembler for the vmcode with graph view, and even lift it to have decompilation capabilities. Since this is a reversing challenge (and as an author, there is a ton of authorship bias since I'm basically reversing something I wrote source for), I will do my best to present the important pieces of information and try to present it in a way to replicate how someone would approach this challenge.Initial analysis lets us know that this is a stripped static ELF. Though often seen as an annoyance in reversing challenges, I pretty much had to link it statically to get the crackme to work in a stable manner across different distros. Regardless, my challenge made very little use of library functions and main is trivial to get to from the entry point. Running the binary shows the following:
Looking at the vm start:
The rsi register is used as the vm instruction pointer, r15 points to the dispatch table of handlers, and rdi points to the vm context. The following information will become apparent later on as I go through more vm handlers, but rax is utilized as an idx for jump tables and sometimes as a temporary variable, rbx is used to hold information about instruction operating size and addressing modes, r8 and r9 are generally used for destination and source for vm opcode handling, and r10 to r14 are utilized as temporary variables. vmquack supports immediate to register, immediate to memory (where memory address is in a register), register to register, register to memory, memory to register, and memory to memory addressing modes, which is different from x86_64. Operands must match in size, with supported sizes being BYTE, WORD, DWORD, and QWORD.
Now, let's look at the main dispatch handler:
Now, let's look through the instruction handlers. I'll start by detailing some of the special handlers.
The last special opcode is opcode 32. This does nothing but jumps back to the main_handler, hence it is just a nop.
Before I continue looking at other opcodes, there are a few operand handlers that I would like to discuss.
At 0x401dfc, we see the start of the following function (zoomed in as the graph view is very large):
We notice that if the first bit is set in the second byte, it jumps to a region that looks like this:
Lastly, notice how at the beginning of this handler, the push will make it return 0x401e32, which just sets r8 equal to r9 (r9 held the value of intermediate/register/memory), before returning to the opcode handler that called it. This entire handler is just for determining the addressing modes used for unary operands.
Another commonly used operand handler is at 0x401e3c. It basically does similar things, but only handles registers (and registers pointing to memory) with the 3rd byte in the instruction, and stores results into the r8 register. It's initial push however causes it to return to 0x401e63, which behaves the same as unary handler on the 4th byte (and allows for immediates as well). However, this time the values in r9 aren't moved into r8, and then after these handlers finish, it goes back to the opcode handlers that called it. Overall, this is a binary opcode operand handler. As you will see later on, r8 holds the destination, while r9 holds the source equivalent in x86_64 during the instruction handlers.
The last used operand handler for instructions is at 0x401e9c which does the same thing as the first part in the binary opcode handler, but then goes into 0x401ec7, which just lodsb a byte and moves it into rcx, and then returns into handlers. As one will see later (or can perhaps even guess based on x86_64 behavior), this is a shift opcode handler (but only 1 byte immediates are allowed for source operand).
From the above conditionals for register selection and size, we can determine the following information. The second byte holds addressing mode and size information. The 0th bit represents the use of an immediate source, the 1st bit represents the use of a register source, the 2nd bit represents the use of a memory source, the 3rd bit represents the use of a register destination, the 4th bit represents the use of a memory destination, the 5th bit represents byte sized operations, the 6th bit represents word sized operations, the 7th bit represents dword sized operations, and the default operation size is for qwords. Moreover, when a byte is used to determine the register used for register or memory referencing based on register value, the following values are used: 0 is RAX, 1 is RBX, 2 is RCX, 3 is RDX, 4 is RSI, 5 is RDI, 6 is R8, 7 is R9, 8 is R10, 9 is R11, 10 is R12, 11 is R13, 12 is R114, 13 is R115, 14 is RBP, 15 is RSP, and 16 is RFLAGS.
With these operand handlers for opcodes finished, we can go back to the instruction handlers.
Opcodes 8 to 13 are conditional jumps, and only take relative immediate byte or dword offsets from the next vm instruction. It also stores the real program's flags, before using the vm's flags register to make the determination of the jump before restoring the original flags.
Opcode 5 and 6 are quite important. Here is opcode 5:
Opcode 6 you can probably guess:
Opcode 30 is another key handler.
Lastly, I will go over 2 more instruction handlers, one unary instruction and one binary instruction.
Opcode 22 is an example of a binary operation.
The rest of the vm handlers are pretty similar, as knowing what they do is quite obvious (since the instruction is in the handler). A few noteworthy exceptions is that vm_shl and vm_shr use the shift operand handlers as mentioned earlier, and that mul, imul, div, and idiv handle the vm behavior just like how it would handle it in x86_64 (with the exception that now imul and idiv only have the one operand form). There are probably a few bugs throughout the vm handlers (such as in idiv) because writing a sizable vm reversing challenge in manual assembly probably isn't the best thing to do a few days before a CTF starts.
At this point, the handlers should be marked accordingly:
Now with the plugin, I'll provide a brief overview of the VM (graph view and the ability to look between LLIL, MLIL, and decomp can make reversing much faster, especially with all the extra normal features of Binary Ninja such as xrefs and compiler optimization, despite this already being a relatively simple vm program).
malloc and free are the two functions following strlen (as evidenced by their extern call). Technically, the behavior of this vm's malloc is really calloc. Following those two functions is a nanosleep function and a prctl function to ignore ptrace_scope rules that the vm utilizes throughout with the virtualized anti debug i mentioned earlier.
Before continuing, there are 2 more important functions to address: 0x31337644 and 0x31337724.
The first is just a simple xor decryption routine, given the previously mentioned encrypted message struct in the program itself:
Before we discuss main now, I will discuss some of the functions used for flag checking. At 0x31337930, the vm initializes some values in the vmdata. I'm calling these state1, state2, and rounds for reasons that will become clear soon.
The following is the primary function that is used to transform given input.
Anyways, all that is left is now is the main function.
By extracting our knowledge from this vmcode and data from the vmdata section, we can come up with the following function to generate the correct input:
Entering this passes the checks and finally gets us the flag!