RC RANDOM CHAOS

Your patched kernel is still vulnerable

Dirty Frag - CVE-2026-31337, CVSS 7.8 - is a UAF in the Linux kernel's IPv4 fragment reassembly path. Container-to-host root on every major distro.

· 7 min read

A Linux kernel local privilege escalation is circulating under the handle Dirty Frag. Tracked as CVE-2026-31337 in initial advisories, CVSS v3 base 7.8, local attack vector, low complexity, no user interaction. The bug lives in the kernel’s IPv4 fragment reassembly path inside the netfilter conntrack module. Affected kernels span 5.10 LTS through 6.12, which puts every major distribution in scope - Debian, Ubuntu LTS, RHEL 9, SUSE, Amazon Linux 2023, the lot. Upstream patch landed in mainline as a single inet_frag_kill refcount fix. Distros are backporting now. In-the-wild exploitation is reported against multi-tenant cloud workloads where unprivileged container escape lands directly on the host kernel.

The bug class is a use-after-free on a kernel slab object, specifically struct ipq, the IPv4 fragment queue tracked by inet_frags. The root cause is a race between the reassembly timer expiry path and the explicit kill path triggered when a final fragment completes a datagram. Both paths take a reference, both decrement on exit, but the ordering of the unlink from the rhashtable bucket and the final refcount drop is not consistent when the timer fires concurrently with a userland-driven reassembly completion. One path frees the ipq through inet_frag_destroy. The other path still holds a stale pointer into the freed slab object and dereferences fragment list heads to walk pending skbs. The dereference happens with frag_mem_limit accounting still in flight, which extends the window where the freed object is reachable.

The primitive is a classic UAF on a kmalloc-512 slab. The attacker controls allocation pressure on that cache from userland by spraying objects of the matching size class - msg_msg structures via System V message queues remain the staple, despite the kmem cgroup hardening introduced in 5.14. Sending fragmented IPv4 datagrams to a local loopback or veth interface drives ipq allocations. The race window is widened by setting net.ipv4.ipfrag_time low and forcing timer-driven reaps while completing reassembly through the data path. The attacker reclaims the freed ipq with a controlled msg_msg payload. The next dereference reads attacker-controlled bytes as a kernel pointer or list head.

From UAF to root follows the standard playbook for slab-confused kernel reads and writes. The reclaimed object provides a write-where primitive into kernel memory via the conntrack code’s manipulation of fragment list pointers. Public techniques for converting list_head corruption into arbitrary write have been documented since the 2021 sequence of nf_tables bugs. The escalation target is modprobe_path, core_pattern, or the current task’s cred struct - choice depends on which mitigations the kernel was built with. On distros where CONFIG_STATIC_USERMODEHELPER is set, modprobe_path overwrite is closed and the cred path is taken. On kernels with KPTI, SMEP, SMAP, and KASLR active, the chain still works because the primitive is data-only - no kernel code execution is required to flip cred.uid and cred.gid to zero. SELinux in enforcing mode does not block the transition because the credential change is mediated by the kernel itself; subsequent execve inherits root with no AVC denial path engaged until policy-relevant syscalls fire.

The initial access vector that matters most for this bug is not remote. It is the unprivileged local user inside a container. CAP_NET_ADMIN is not required. The fragment reassembly path is reachable from any user namespace that can send IPv4 packets on a network namespace it controls, and unprivileged user namespaces are enabled by default on Ubuntu, Debian, and Fedora. A container with default seccomp and the default capability set can drive the race. AppArmor and SELinux confinement do not stop the kernel-side dereference because the policy check happens at syscall entry, not inside the kernel datapath. This is the reason cloud and Kubernetes operators are treating this as a node-compromise primitive rather than a local-only nuisance. One pod, one tenant, one kernel - the boundary collapses.

Mapping to MITRE ATT&CK. T1068, exploitation for privilege escalation, is the primary technique. T1611, escape to host, applies in the container case. T1014, rootkit, is a documented follow-on once root is obtained - kernel module loading via modprobe is the established next hop, and operators are seeing LKM-based persistence within the same intrusion sequence. The tooling reported in the wild is custom. No public Metasploit module exists at time of writing. Cobalt Strike is not relevant; this is a Linux kernel bug and the operators using it are running ELF post-exploitation chains, not Beacon.

Threat actor attribution is preliminary. Two clusters are named in vendor reporting. The first is a financially motivated group operating against cloud-hosted source code repositories and CI runners, with overlap in infrastructure to prior cryptojacking campaigns. The second cluster shows behaviour consistent with a state-aligned operator pivoting through managed Kubernetes environments, with selective targeting and minimal noise post-escalation. Both are using Dirty Frag as a pure privilege escalation primitive paired with a separate initial access vector - exposed Jenkins, leaked service account tokens, or supply chain compromise of a base image. The kernel bug is the second stage, not the entry point.

Telemetry reality is where defenders need to pay attention. The exploitation primitive runs entirely inside the kernel network stack and inside the slab allocator. There are no execve events to alert on during the race itself. auditd is silent. Falco’s default ruleset does not fire - the syscalls involved are sendto, recvfrom, setsockopt, msgsnd, msgrcv, all routine. The first observable event is the post-escalation behaviour. A process whose audit UID is non-zero suddenly operating with effective UID zero is the canonical signal. auditd rule on syscall=setresuid or comparison of /proc/[pid]/status uid lines against loginuid catches the cred swap retrospectively. eBPF-based tooling - Tetragon, Tracee, the bcc capable tool - can hook commit_creds and flag credential transitions that do not originate from a known setuid binary. That is the detection that actually works for this class of bug.

Network telemetry will show fragmented IPv4 traffic on loopback or internal veth interfaces, which most SIEMs do not ingest. Zeek on a host monitor will log frag events but volumes are noisy and the signal is weak. Kernel log telemetry is more useful - slab corruption that does not result in a clean exploit produces dmesg entries with general protection fault, BUG: KASAN, or list_del corruption signatures. KASAN is not enabled in production kernels, but the GPF and list corruption traces are visible. A SIEM rule on kernel oops events tagged to inet_frag_kill, ip_defrag, or nf_conntrack_reasm in the call trace is a reasonable proximate detection for failed exploitation attempts.

EDR coverage on Linux is uneven. CrowdStrike Falcon, SentinelOne, and Microsoft Defender for Endpoint on Linux instrument via eBPF and hook a defined set of LSM and tracepoint events. Credential transitions are covered by all three vendors as a high-fidelity signal. Kernel exploitation primitives themselves are not directly observable to user-space EDR. The detection gap is the window between the slab corruption and the first user-space action taken with elevated credentials. That window can be milliseconds, or it can be hours if the operator is patient. SOCs depending solely on process-tree anomalies will see nothing until the operator drops a binary or modifies a sensitive file.

Patch boundary. The fix is a single commit reordering the unlink and refcount decrement in inet_frag_kill, plus an additional smp_mb to close a memory ordering hole on weak architectures. Distribution kernels carrying the backport are dated from the second week of May 2026 forward. Verify with uname -r against the distro’s security advisory, not against mainline version strings - backports renumber. Kernels built before the patch remain exploitable regardless of userspace hardening. Disabling unprivileged user namespaces via sysctl kernel.unprivileged_userns_clone=0 closes the most common container path on Ubuntu and Debian but does not protect baremetal multi-user systems. Disabling IPv4 fragment reassembly is not a viable mitigation - too much traffic depends on it.

Residual exposure post-patch is limited to the conntrack reassembly path under the same race conditions, and audit of adjacent inet_frags consumers - IPv6 nf_conntrack_reasm_ipv6 and the bridge netfilter equivalents - is in progress upstream. Expect a small follow-on series of related CVEs in the same subsystem over the next quarter. The bug class is not new. The exposure is the kernel’s continued reliance on refcounted shared objects across timer and data-path boundaries, and the slab allocator’s predictability under user-driven pressure. Until the slab freelist is randomised more aggressively or the conntrack reassembly path is rewritten to remove the shared timer, this class of bug will keep landing.

Share

Keep Reading

Stay in the loop

New writing delivered when it's ready. No schedule, no spam.