RC RANDOM CHAOS

Dirty Frag roots every kernel

Technical analysis of CVE-2026-3490 'Dirty Frag' - a page_frag refcount UAF in the Linux kernel enabling local root on stock 5.15-6.8 kernels.

· 7 min read

A new Linux kernel local privilege escalation has been disclosed under the name Dirty Frag. Tracked as CVE-2026-3490, CVSS v3 7.8, vector AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H. Local attacker, low complexity, no user interaction. Root from any unprivileged shell on every mainline kernel between 5.15 and 6.8.7. CWE-416, use-after-free. Patch landed in mainline at commit boundary 6.8.8 and was backported to 6.6.32, 6.1.92, 5.15.160. Distros pushed updated packages over the following 72 hours. The dwell window is the gap between disclosure and your patch cadence.

The bug sits in the per-CPU page_frag_cache subsystem. page_frag is the allocator the network stack uses to carve sub-page buffers for skb head data, GRO fragments, and TUN/TAP packet payloads. Each CPU maintains a current page and an offset. Callers request a fragment of N bytes, the allocator hands back a pointer at the current offset, increments the offset, and bumps the refcount on the underlying compound page. When the page exhausts, the cache rotates to a freshly allocated one. The refcount keeps the page alive until every fragment is returned.

The defect is a refcount accounting error in the rotation path. When the cache rotates under contention - specifically when a softirq path and a process context path both attempt to consume the tail of the same page - the bias accounting is decremented twice for a single allocation under a narrow window. The page is released to the buddy allocator while at least one outstanding fragment still holds a kernel pointer into it. The fragment pointer is now a stale reference to a page that can be reallocated for any purpose the slab allocator or page cache requests. Classic UAF, but at page granularity, inside the network fast path, on a structure that crosses privilege boundaries.

The primitive is dangerous because of what page_frag pages get reused for. Once freed, the page returns to the per-CPU page list and is available for kmalloc-1024, kmalloc-2048, GFP_KERNEL allocations, anonymous user mappings via the buddy allocator’s order-0 path, or pipe buffer backing. An attacker who controls the timing of the double-decrement controls the lifetime mismatch between the stale fragment reference and the reallocation. The exploitation pattern is to spray reallocations of a target kernel object - cred structs, file structs, msg_msg headers - into the freed page while the network stack still treats the original fragment as live. The next packet write through the stale skb fragment overwrites attacker-chosen offsets inside the new object.

The trigger does not require a special syscall surface. Any path that drives page_frag consumption works. sendmsg with MSG_ZEROCOPY off, TUN/TAP write loops, raw socket transmission, AF_PACKET frames. Userland controls fragment size by controlling payload length. Userland controls timing by pinning to a CPU and racing softirq delivery with process-context sends. The race is reliable because the contention pattern is naturally produced by saturating the per-CPU queue. Reports from the disclosing researcher describe single-digit-second exploitation on stock kernels with SMAP, SMEP, and KASLR enabled.

KASLR does not help. The primitive is a page-granular UAF, not a pointer leak. The attacker does not need to know kernel addresses. They need to know which slab the freed page lands in, which is determined by the next allocator request, which the attacker drives. SMAP does not help. The corrupting write is performed by the kernel itself, through the network stack, against memory the kernel believes it still owns. SMEP does not help. No userland code executes in kernel context - the kernel’s own write primitive is repurposed. KPTI does not help. The bug is entirely on the kernel side of the boundary. The mitigations that defeat classic ret2usr and ROP chains are orthogonal to a refcount UAF that produces an attacker-controlled overwrite inside a target slab object.

The escalation primitive of choice is cred struct overwrite. msg_msg spraying through System V message queues is the standard technique for landing controlled bytes into a kmalloc slab of a chosen size. The attacker frees the target page through the network path, immediately allocates cred-sized objects via fork or setuid-adjacent syscalls so the freed page is repopulated with cred structs belonging to attacker processes, then drives one more network write that lands inside the cred slab and overwrites the uid, gid, and capability fields of one of those credentials. The owning process then execs and inherits root. MITRE T1068, exploitation for privilege escalation. No userland shellcode. No kernel ROP. The kernel writes the attacker’s payload using its own legitimate code path.

In-the-wild use is confirmed by two independent reporters. Google TAG attributed a sample to a commercial surveillance vendor’s Linux toolkit observed against journalist endpoints in March. CrowdStrike’s Falcon OverWatch reported a separate cluster using the same bug class against container escape scenarios - the same primitive works to break out of unprivileged user namespaces because the page_frag cache is host-kernel state, shared across namespace boundaries. Container runtimes that rely on user namespace remapping for tenant isolation gain no protection. The kernel allocator does not care which namespace requested the network write.

Telemetry on this is thin. The exploitation path produces no syscall sequence that looks anomalous in isolation. sendmsg, setuid, execve. Every modern Linux process does all three. Falco rules tuned to detect SUID escalation will fire only after the cred overwrite succeeds, by which point the process is already root. auditd records the post-escalation execve under the new uid, not the corruption that produced it. eBPF-based runtime sensors - Tetragon, Tracee - observe the same syscall stream. The corruption itself is invisible from userspace.

Kernel-side detection is possible but expensive. Slab debugging - SLAB_DEBUG, KASAN, KFENCE - catches the UAF reliably and immediately. None of those are enabled on production kernels because of the throughput cost. KFENCE sampling at low rates will catch the bug stochastically; defenders running KFENCE on a fraction of fleet hosts have a realistic chance of seeing a kernel panic with a UAF trace before exploitation succeeds on a monitored node. The panic stack will show page_frag_alloc_align or __page_frag_cache_drain in the freeing path with an inconsistent refcount. That is the signal.

Network-side, the race condition requires the attacker to drive per-CPU socket traffic at a sustained rate. The traffic itself is loopback or process-local - no wire egress. Host-based network telemetry sees nothing. EDR products that hook socket syscalls observe high-frequency sendmsg from a single process across a tight window, but high-rate local socket traffic is not anomalous on its own. Detection engineering value comes from correlating that pattern with a subsequent uid transition from the same process tree within seconds. Sysmon-for-Linux event 1 with a process execution under root, where the parent process was non-root and recently issued sustained sendmsg or setsockopt activity, is the closest practical signal.

The patch is a refactor of the bias accounting to use atomic compare-and-swap on the refcount delta rather than the read-modify-write pattern that admitted the race. Upgrade boundary is 6.8.8, 6.6.32, 6.1.92, 5.15.160. Anything older on those branches remains vulnerable. Distro kernels with vendor backports lag mainline by hours to days; verify the actual commit, not the package version string. Ubuntu’s HWE kernels, RHEL’s z-stream, and Amazon Linux 2023 all shipped fixed builds within the disclosure week. Stuck on a 5.4 LTS for embedded reasons - that branch is also affected and the backport is community-maintained, not vendor-blessed.

Residual exposure post-patch is the unpatched fleet and the container hosts running shared kernels under multi-tenant workloads. Kubernetes nodes with untrusted workloads - CI runners, hosted notebooks, function-as-a-service backends - are the highest-value targets. A single compromised pod with sendmsg capability against an unpatched node kernel escalates to host root, then to the kubelet credential, then laterally across the cluster. gVisor and Kata Containers neutralise the bug because both interpose a separate kernel between the workload and the host. Standard runc workloads do not.

The technical reality is that page_frag is one of dozens of per-CPU caches in the kernel that trade safety for throughput by minimising locking. Dirty COW exploited a race in copy-on-write fault handling. Dirty Pipe exploited an uninitialised flag in pipe buffer reuse. Dirty Frag exploits a refcount race in fragment allocation. The pattern is consistent. Performance-critical allocator paths with subtle ordering requirements produce exploitable primitives when an edge case in the concurrency model is missed during review. The next one is already in the tree. Patch cadence is the only control that matters.

Share

Keep Reading

Stay in the loop

New writing delivered when it's ready. No schedule, no spam.