Search This Blog

Sunday, October 4, 2020

CUCTF 2020 Dr. Xorisaurus Heap Writeup (glibc 2.32 UAF)

Here is my writeup for my 2.32 glibc heap challenge (Dr. Xorisaurus) from CUCTF 2020; make sure to check out the writeup for my kernel challenge Hotrod as well!

One important concept to note about glibc 2.32 is the new mechanism of safe linking on the singly linked lists. This new protection scheme is discussed in depth here. Basically, for singly linked freelists (fastbins and tcache bins), free chunk fds are obfsucated by the following scheme: (stored pointer) = (address of fd pointer >> 12) ^ (fd pointer). With a heap leak, this protection can be easily bypassed as heap behavior in glibc is predictable, which is what this challenge will revolve around. Bruteforcing or  leaking a copy of the stored pointer and applying some basic crypto knowledge can help you recover the original data as well in some cases (especially when the chunks in the list are close together).

In this challenge, we were given a libc with debug symbols, linker, and patchelf'd binary with the following protections:

Now, when reversing this binary, one should find 4 features. 

You can fill a glass, examine a glass, drain a glass, and switch the contents of the glass according to the menu. There is also an initial sigalarm in the beginning, and you can only have a maximum of 25 glasses. Filling a glass is equivalent to an allocation; it finds an index in the global glasses array for you, requests for a size that is in the range of 0x60 and larger fastbin sizes, and reads in some data. Examining a glass can be useful for leakage, as it just puts() the content of the chunk out; note that examinations can only be used twice (which can be assumed to be for a libc leak and a heap leak). Draining is the equivalent of a free and it is safe as it nulls out the pointer in the global array. You can use this feature as many times as you can, but once you swap contents (feature 4), you can only free one more time. As for the swap function, you can use it to free a chunk, and then immediately reallocate based on 2 choices for sizes. After the allocation, the binary reads in 8 bytes. This where the 8 byte UAF comes in as the conditional is poorly written, so if you select an invalid choice, there will be no re-allocation and you will be reading into the freed chunk's metadata (take a look at the decompilation below). Now let's plan out our exploit:

One might make the mistake of thinking of using swap to create a double free, but the 8 byte UAF won't allow you to change tcache keys so freeing that chunk again will fail a malloc() check. Some might think about filling tcache and then applying a fastbin dup attack, but the fact that you can only free one more time after swapping prevents the bypass against the fastbin double free check. 

To obtain a leak, one might be tempted to just free a chunk and then reallocate it to see the obfuscated pointer (and then shift left by 12 bits to recover heap base). However, the read call during the allocation requires at least one byte (unless pty is enabled server side), so 5 nibbles of the heap address will be missing. This means there would be 1 byte of entropy on the leak, but a proof of work is required for 3 bytes of a random sha256 hash on remote, so bruteforcing isn't as feasible.

A better way to obtain a leak is to abuse the behavior of scanf. When scanf reads in large payloads of characters that follow it's format specifier, scanf will begin to allocate from the heap. For example, if we send in 0x500 '1's, scanf will make a largebin allocation request from the heap. As one familiar with the heap might know, triggering largebin sized allocations will lead to malloc_consolidate() (source), which will go through the freed fastbins and consolidate them to unsorted (source). This malloc_consolidate() is the basis for another type of attack known as fastbin consolidation, which is discussed here in better depth. After malloc_consolidate(), the request for the large allocation will then cause the chunk in unsorted to be sorted into largebin. On the next request, one can use it to request a heap leak. The chunk will then be sorted into unsorted, from which we can easily grab a heap leak (feel free to debug this out when I attach my exploit later on if this seems confusing). This method of leaking really only came up after my teammate c3bacd17 found an unintentional bypass in one of my other challenges.

Once we have the leak, some basic math will allow you to abuse the 8 byte UAF to maliciously corrupt the obfuscated pointer. Note that 2.32 malloc()'s safe linking mechanism also ensures that the deobfuscated pointer is aligned. Because of this and the fastbin size check, we can no longer do the unaligned trick here for fastbin dup. We will have to rely on tcache poisoning here, and an evil obfsucated pointer can be created by xoring the address location of the fd right shifted by 12 bits with the target location.

I ended up targeting __free_hook and changed it to system, then "freed" a chunk with the string "/bin/sh" on it to pop a shell. As for the proof of work on remote, it can easily be handled by the proofofwork python library that automatically generates a proof.

The following is my final exploit with comments:

Hope everyone enjoyed this challenge and writeup! Feel free to let me know if anything needs to be clarified or if anything explained is incorrect. Congrats to lms of Dakota State for blooding this challenge as well!

For those interested in trying this challenge out, it is archived in the CUCTF repos.

CUCTF 2020 Hotrod Kernel Writeup (Userfaultfd Race + Kernel UAF + Timerfd_Ctx Overwrite)

Recently, I made some pwn challenges for my teammate Chirality, who helped organize CUCTF 2020; Dr. Xorisaurus (glibc 2.32 heap) and Hotrod (kernel heap and race). I thought it would be nice to share my writeups for each. You should also check out Chirality's kernel heap challenge for CUCTF, called BYOD.

Before I start, I would like to acknowledge and give appropriate credit to all the links (posted throughout this article) I studied off of to make both this challenge and my exploit possible.

If you have done plenty of glibc heap exploitation before, there is one important idea you should note about kernel heap exploitation. Rather than relying completely on kernel heap feng shui (even though the allocators are much simpler in kernel), it's oftentimes better to utilize certain structures with function pointers for leaks and RIP control. The basis of this challenge is to use a race condition to create a UAF scenario, from which you can hijack timerfd_ctx structures to take control of RIP.

Opening this challenge up, it looks like a standard kernel pwn setup. A file system, bzImage, and a qemu launch script is given. The following two commands will be very handy for manipulating the file system for debugging/analysis purposes:

The qemu launch script is the following:

This tells us that SMEP, KPTI, and KASLR is enabled, but there is no SMAP (which simplifies this a lot).

We can also use vmlinux-extract to help extract the kernel from its compressed file. The driver itself is hotrod.ko based on the startup script (and the name of the challenge). Now, let's do a quick analysis of the driver.

Like many other standard CTF kernel challenges, a miscdevice is created during initialization and a mutex is also initialized. The device also has a file_operations struct where only the unlocked_ioctl field is populated. Looking through hotrod_ioctl, one can also infer that there is a global struct storing both the size as an unsigned long and a pointer to an allocated chunk located at 0x7e0 relative to module base. This function also has an add, show, delete, and edit function, all of which can only be used once (and you only get one hotrod total). Alloc occurs when the ioctl argument is 0xBAADC0DE.

It checks if you have already attempted an allocation and if the hotrod has already been populated. If not, it will allocate a chunk for the hotrod and sets its size to the argument passed in (the size must fall within the 0xd0 to 0xe0 range). There doesn't seem to be a bug here. Delete occurs when the ioctl argument is 0xC001C0DE.

Again, proper checks are ensured, and the hotrod is zeroed out. This feature can also only be used once. Viewing occurs with ioctl command 0x1337C0DE.

Again, it seems quite safe. We can use this for a leak after we allocate and free certain kernel structures though since kmalloc() doesn't zero out memory. Lastly, edit occurs with argument 0xDEADC0DE.

Again, it seems pretty safe. Like the viewing function, the argument is interpreted as like a hotrod struct as well. The sizes for editing (as well for viewing earlier on) are both checked (so no going out of bounds or overflows). In edit's case, if the size check is satisfactory, it will proceed to copy the user's data to the kernel hotrod's car. 

Overall, this module looks quite safe. Where exactly could the bug be? Well, in this ioctl handler, the mutexes were never used, opening this up to race conditions.

Due to the checks on sizes and restriction to only use each feature once, a good race strategy would be to launch edit in one thread, and in another thread, quickly free the chunk and allocate another kernel structure in a way where the second copy_from_user() happens such that the chunk is already freed but the pointer to the chunk is also already passed to the function. A great way to reliably race is with the userfaultfd syscall. With userfaultfd, we can set up a page fault handler over a certain page we mmap in userspace; even when a pagefault occurs for the kernel accessing it, our handler will run, from which we can hang the kernel thread, run the code meant for the race, and then unblock it with a UFFDIO_COPY ioctl where uffdio_copy.mode is not set. This is actually an extremely common technique to reliably race in the kernel, with several articles and CTF challenges including this concept (such as the famous Balsn CTF KrazyNote challenge):

There does seem to a recent hardening against this method of attack as mentioned here, but is not set by default for compatibility reasons.

From our exploit's perspective, we can have one thread call edit and have it copy over a user hotrod struct where the data, or "car," pointer points to a page where we setup a userfaultfd handler for. Then during edit's second copy_from_user(), it will pagefault when it attempts to copy based on our pointer, and our handler will take over from there, from which we can free and allocate other kernel structures over the same region. Then, you can unblock the thread by copying over the data we want placed there. Personally, I kept all the original data with the copy (to avoid corrupting the kernel structure) except for one of the function pointers, which I change to a stack pivot. Now, after the unblock, the code resumes and everything goes back to "normal," until the overwritten function pointer is triggered.

Due to our structure size, many of the common structures can't be used. However, timerfd_ctx can be quite a useful struct; we can allocate it with a timerfd_create() with the CLOCK_REALTIME option (other options will also work) and a timerfd_settime() call. Using this structure, we can both get a leak and control RIP via the location that stores the function pointer to timerfd_tmrproc(). The function pointer executes after a certain time period which you can control in the itimerspec struct. This structure has been documented before in both ptr-yudai's article about useful kernel structures, this paper about exploitable structures, and GNote from TokyoWesterns 2019. Note that for me, any subsequent sleep calls with the corrupted structure would fail, so I hung the thread to wait for the function pointer to trigger with a getchar().

To grab a leak, I had to spray these structs in the same kmalloc slabs. Then, I freed the last sprayed chunk and immediately made hotrod allocate data there for us to grab the leak reliably. With the KASLR leak, we can rebase the entire kernel relative to startup_64 symbol in kallsyms and then use the aforementioned race to change the function pointer to a stack pivot gadget; we can pivot it to a userspace stack as there is no SMAP. Note that you need to specify a valid range for ropper/ROPGadget to search for gadgets; otherwise, it'll find gadgets that aren't in executable sections in the kernel. Take a look at the example below:

Since there is KPTI and SMEP, the traditional SMEP bypass of changing the CR4 register won't work; KPTI fully isolates user page tables from kernel page tables by managing the two sets via the 12th bit of the CR3 register (the userspace portion of kernel page tables is set to NX, and the only additional information given to userspace page tables is the information necessary to enter and exit the kernel). Instead, it is better to rely on a kpti trampoline and have it fix the CR3 for us so we can go back (swapgs_restore_regs_and_return_to_usermode); these functionalities exist in the kernel because it needs to handle this for routines like syscalls. I usually add +0x16 to where this is located, just so I can skip all the initial pops and start right at movq %rsp, %rdi. Using this trampoline combined with a commit_creds(init_cred) to change my uid to 0 beforehand, I can then choose whichever function to return to in my userspace code with root privileges. Of course, I needed to specify the cs, ss, r_flags, and stack (specifically, at that location, it expects RDI, orig_ax, RIP, CS, EFLAGS, RSP, SS) for the trampoline to return to as well; I just used the values I saved beforehand in the userspace process.

In my case, I was not able to execve or perform many other functions without causing a kernel panic, so I ended up doing open read write in my function. I also had to just halt the OS; otherwise, the kernel panics on the return, hangs, and then somehow spikes my CPU usage to 100%. I'm not too sure why that happened, so if you know why, please let me know.

Below is my exploit with comments and linked resources:

To transfer the exploit to the remote instance, I just compiled it statically with gcc, gzip'd it, and then transfered with base64 encoding and cat > exploit << EOF. It was still relatively large and took about 7 minutes to transfer, but if one really was working under time constraints, compiling with a more minimalistic library like musl or uclibc could help. Here's the final result:

If I made any mistakes in my explanations above, feel free to let me know so I can correct them. I'm still continuing to study the Linux kernel and find it quite fascinating! Thanks again to CUCTF for hosting the event!

For those interested in trying this problem out, it is archived in the CUCTF repos.