I spent a lot of time working on kernel fuzzing this summer after graduation as a continuation of my MEng research. In this post, I will detail an interesting scenario I encountered. There was a consistent repeated false positive hang in net/sched, which I transformed into the CTF challenge HangBuzz101 for corCTF 2025. I will also conclude with a tldr of the overall hobbyist vulnerability research and bug bounty experience in collaboration with syst3mfailure in the past few months.
As every kernelCTFer can attest to, net/sched was a 0-day gold mine. So naturally, I fuzzed it a lot. One thing I noticed repeatedly was an occasional reported soft lockup or a task hang. Examples included:
BUG: soft lockup in net_tx_action
BUG: soft lockup in sys_sendto
BUG: soft lockup in ip_rcv
BUG: soft lockup in tc_modify_qdisc
BUG: soft lockup in inet_stream_connect
BUG: soft lockup in sys_sendto
In certain fuzzing campaigns (depending on the corpus), I would have flooded supressed reports with watchdog reports on a soft lockup as well. Many of the reports looked like the below, usually affecting the fq (fair queue) qdisc.
Interestingly enough, while vanilla Syzkaller’s syz-repro
could not reproduce many of these hangs, manually re-running the entire log with ./syz-execprog -enable net_dev -repeat 0 log
often resulted in the same hang. As a result, I began manual bisection,
reducing the number of programs. In the end, it would often require me
to end up with 2 to 3 programs for the hang to reproduce. This turned
out to be the case because Syzkaller’s executor does not truly reset network namespaces despite the function name reset_net_namespace
.
This means that previous Syzlang programs would leave their
modifications to the qdisc hierarchy in the kernel and permanently
affect subsequent executions, thereby permanently altering kernel state.
This decision was made for performance, based on comments, but this
makes the very metric (coverage/signals) vanilla Syzkaller judge
programs by unreliable.
Anyways, once I had a simplified single program Syzlang repro. I would run the executor in the shell along with the following command to dump the tc state:
while true; do pid="$(lsns -t net -n -o NS,PID,COMMAND | grep syz-executor | awk '{print $2}' | head -n1)" echo "Entering network namespace of PID $pid" nsenter -t "$pid" -n \ sh -c 'tc -d qdisc show dev lo; tc -d class show dev lo' done
This would show me the qdisc state to reconstruct, and I ended up with the following repro eventually:
ip link set dev lo up tc qdisc add dev lo root handle 8001: stab linklayer atm overhead 77174400 mtu 1 tsize 1 \ fq \ limit 1 \ flow_limit 1 \ buckets 2 \ orphan_mask 1 \ quantum 1 \ maxrate 10bit \ low_rate_threshold 1bit tc -s -d qdisc show dev lo ping -I lo -s1000 -c4 127.0.0.1
Alas, this “bug” finally made sense. A tiny quantum parameter combined with a huge stab overhead value! Looking at fq_dequeue
,
the quantum determines the credit earnings per dequeue round, and the stab
overhead adds the specified number of bytes to each packet. Hence, we
become stuck in a scenario where we are very negative in credit (after
the first packet dequeues) and are stuck looping due to the low quantum count.
begin: head = fq_pband_head_select(pband); if (!head) { while (++retry <= FQ_BANDS) { if (++q->band_nr == FQ_BANDS) q->band_nr = 0; pband = &q->band_flows[q->band_nr]; pband->credit = min(pband->credit + pband->quantum, pband->quantum); if (pband->credit > 0) goto begin; retry = 0; } if (q->time_next_delayed_flow != ~0ULL) qdisc_watchdog_schedule_range_ns(&q->watchdog, q->time_next_delayed_flow, q->timer_slack); return NULL; } f = head->first; retry = 0; if (f->credit <= 0) { f->credit += q->quantum; head->first = f->next; fq_flow_add_tail(q, f, OLD_FLOW); goto begin; }
There were a few other variations in other qdiscs, but none of these are actual bugs in my opinion. These lockups do not last forever (as soft lockup watchdog warnings just trigger around after 25ish seconds) as the quantum value will eventually correct itself. Additionally, this behavior only triggers reliably on native systems with KCOV and KASAN due to the additional instrumentation weight. They can reliably replicate in non-accelerated hypervisor environments, but this is definitely not the common case.
To stop these false positives, I made the following changes to net/sched grammar in Syzlang.
diff --git a/sys/linux/socket_netlink_route_sched.txt b/sys/linux/socket_netlink_route_sched.txt index bb718b8f2..6c3aec491 100644 --- a/sys/linux/socket_netlink_route_sched.txt +++ b/sys/linux/socket_netlink_route_sched.txt tc_netem_corrupt { tc_netem_rate { rate int32 - packet_overhead int32 + packet_overhead int32[0:256] cell_size int32 - cell_overhead int32 + cell_overhead int32[0:256] } tc_netem_slot { tc_police { tc_ratespec { cell_log int8 linklayer flags[linklayer, int8] - overhead int16 + overhead int16[0:256] cell_align int16 mpu int16 rate int32 tc_sizespec { cell_log int8 size_log int8 cell_align int16 - overhead int32 + overhead int32[0:256] linklayer flags[linklayer, int32] mpu int32 mtu int32
A more aggressive approach could also be to just remove the TCA_STAB
variant from the rtm_tca_policy
Syzlang union definition.
For corCTF 2025, I made this into a challenge called HangBuzz101 by rebuilding the kernel after the following patch command:
sed -i "s/BUG: soft lockup/BUG: soft lockup, here is your flag: ${FLAG}/g" kernel/watchdog.c
Surprisingly, only 2 teams solved this challenge, even though I provided a pretty minimal kernel configuration and emphasized net/sched. Perhaps this is just authorship bias, as I meant for this to be an easy challenge.
I would say this venture into fuzzing has been fruitful. By the end of the summer, my custom version of Syzkaller yielded some nice results. My longtime collaborator Savy and I managed to find in total around 8 CVEs (5 in net/sched, 2 in kTLS, and 1 in io_uring), though these were mostly local DOS bugs. I also used this as an opportunity to make commits into the Linux kernel, which Brad Spengler called out as “Google indirectly funding extensive network scheduler developer.” My friends termed this “unemployment behavior” because it was all for free but honestly this was a worthwhile experience that I would like to continue to pursue - thank you to all the netdev maintainers for the help.
Out of the bugs we found, only 3 were exploitable. The first one was CVE-2025-38001, in which we compromised kernelCTF’s LTS, COS, and mitigation instance for a payout of around 82k. We also caused Google to take down the PoW system afterwards as a member of the Crusaders of Rust Security Research Group managed to break the Sloth VDF with some Zen5 AVX512 acceleration.
The second one was CVE-2025-38616, which we publicly disclosed with a very detailed bug report after we failed to find an exploitable path. Unfortunately, it actually was exploitable and we identified the very path required to do so, but could not trigger it due to a single character typo (we jokingly refer to this as the “100k typo”). The gist of the matter was that an attack path would only arise if the crypto algorithms operated in asynchronous mode, which the cryptd and socket_alg interface allowed us to force. However, crypto algorithms with SIMD support like GCM would unconditionally override them back into synchronous mode. We needed to find a crypto algorithm supported by TLS and kernelCTF without SIMD support to trigger exploitability, and CCM was one such candidate. But we made a single character typo - while registering TLS to operate under CCM mode, we registered GCM instead of CCM for asynchronous operation with cryptd. In the end, congratulations to n0psledbyte of Starlabs for claiming the kernelCTF bounty for an expected reward of 82k-92k, which the 0-day bonus could have brought to over 100k. This is not to say we definitely would have been able to claim this bug bounty, but it remains a funny (albeit painful) story highlighting the difficult and competitive nature of vulnerability research.
Our final submission was CVE-2024-58240. We targeted the COS-113 instance for an expected payout of 21k (as unprivileged user namespaces were not required). Interestingly enough, this bug was caused by a backporting mistake to the 6.1.x branch of Linux in the kTLS subsystem left uncaught for the past 1.5 years - commit 13114dc5543069 was backported without its dependent commit 41532b785e9d79. This submission is a small redemption for our previous blunder.
Overall, I really enjoyed this summer of vulnerability research. Before my MEng, I have not seriously focused on real world VR for a year or two due to other interests at school, following the typical SWE pipeline. The ability to direct my own research and craft actual exploits has re-sparked a passion for this field and I hope to continue with it. Thank you to Professor Mengjia Yan and the MATCHA group at MIT CSAIL for graciously funding my summer research. As always, feel free to let me know of any questions, concerns, corrections, inquiries, or anything else.