User namespaces are still a root pipe

A new local privilege escalation primitive is circulating against the Linux kernel’s network fragmentation reassembly path. Public tracking calls it Dirty Frag. The bug class is a race-conditioned use-after-free in the IP fragment queue, with a secondary write primitive in inet_frag_queue reclamation. Affected versions span the long-term kernels 5.10, 5.15, 6.1, 6.6, and current 6.12 trees prior to the upstream fix. Distributions confirmed exposed at disclosure include RHEL 9, Ubuntu 22.04 and 24.04, Debian 12, SUSE 15 SP5, and Amazon Linux 2023. CVSS v3 lands at 7.8 - AV:L/AC:H/PR:L/UI:N/S:U/C:H/I:H/A:H - local, low-privilege, high-complexity, full triad impact. CWE-416 with a CWE-362 root cause. The result is reliable root from any unprivileged user namespace.

The mechanism sits in net/ipv4/inet_fragment.c and the per-namespace fragment cache. Linux reassembles IP fragments through a hash table of inet_frag_queue structures keyed by source, destination, protocol, and ID. Each queue accumulates fragments until the datagram completes or the timer expires. The queue lifetime is reference counted. Eviction happens through three paths - completion, timer fire, and LRU pressure when the namespace cache exceeds frags.high_thresh. The three paths take different locks. The completion path holds the queue spinlock. The timer path acquires it after firing. The LRU eviction path walks the rhashtable and drops the final reference outside the queue’s own lock.

The race is between the LRU evictor and a late-arriving fragment that has already passed the rhashtable lookup but not yet bumped the refcount through inet_frag_find. The lookup returns a pointer. Before the caller increments the refcount, the evictor decrements it from another CPU, observes zero, and schedules the queue for RCU-deferred free. The caller proceeds, takes the queue lock, and writes the incoming fragment skb into q->fragments_tail. The write lands in memory that is about to be reclaimed. On the next allocation of the same kmem_cache slab - ip4_frags_cache, SLUB, order-0 - the freed object is recycled. The attacker controls the timing of that recycle by spraying fragment queues from a second namespace. The freed inet_frag_queue is replaced with attacker-shaped data. The original caller’s write then corrupts a chosen field of the new object.

The exploit primitive is a single 8-byte write at a controlled offset inside a freshly allocated kernel slab object. The offset is determined by the position of fragments_tail in the original inet_frag_queue layout - predictable per kernel version, recoverable from /proc/kallsyms when kptr_restrict is zero or via known offset tables for distribution kernels. The attacker shapes the slab by allocating and freeing fragment queues until the target offset overlaps a function pointer or list head in a chosen victim object. Public discussion has identified seq_operations, msg_msg, and pipe_buffer as candidate targets, all of which have been viable kernel UAF landing pads since at least 2021. The chosen victim determines whether the primitive becomes control flow hijack, arbitrary read, or arbitrary write.

Reachability is the load-bearing detail. The bug is triggered through unprivileged user namespaces. CLONE_NEWUSER followed by CLONE_NEWNET gives any local user a private network stack with its own fragment cache and its own frags.high_thresh. The attacker drives the race entirely inside that namespace. No CAP_NET_ADMIN required. No external network access required. No root required. This is the same reachability profile that made Dirty Pipe and the io_uring family of bugs catastrophic - a local user, a syscall, a kernel write. Distributions that ship with kernel.unprivileged_userns_clone=1 - which is the default on Ubuntu, Debian, Fedora, and Arch - expose the primitive to any logged-in account. Distributions that disable unprivileged user namespaces by default, including RHEL 9 with its hardened sysctl, raise the precondition to CAP_SYS_ADMIN.

MITRE mapping is straightforward. T1068, exploitation for privilege escalation. Subtechnique under T1611 if the attacker breaks out from a container - and the namespace reachability makes this a container escape primitive against any runtime that does not block the userns syscall or pin kernel.unprivileged_userns_clone=0. Docker default seccomp does not block unshare(CLONE_NEWUSER). Kubernetes default pod security does not restrict it. Containerd inherits the host setting. A compromised workload in a default-configured cluster reaches the bug.

In-the-wild exploitation is not yet attributed to a named actor. The disclosure timeline shows a researcher PoC published with the upstream patch, followed within 72 hours by weaponised variants posted to closed channels. Public telemetry from honeypots running default Ubuntu 24.04 has logged exploitation attempts from kernels matching the affected range. No ransomware crew has been linked yet - historically, kernel LPE primitives of this class are absorbed first by initial-access brokers and red team tooling, then by commodity actors three to six months later. The Linux LPE chain for ransomware operators currently leans on older OverlayFS and nf_tables bugs. Dirty Frag will displace them once stable exploit modules ship.

Telemetry coverage is thin and that is the operational reality. The trigger sequence is a series of socket, sendmsg, and unshare calls - none of which are anomalous in isolation. EDR agents on Linux that hook the userns and namespace syscalls - Falco with the default ruleset, CrowdStrike Falcon on Linux, SentinelOne Singularity - will observe namespace creation events but not the race itself. The race is invisible to userspace. The slab spray pattern is observable only through kernel-level instrumentation. eBPF-based tooling watching kmem_cache_alloc and kmem_cache_free on ip4_frags_cache will see the spray, but no production fleet runs that hook by default because the volume kills throughput. Auditd execve records will show no suspicious binary. The exploit can be staged from a single statically linked ELF or from a Python process using ctypes against libc.

What does fire. Sysmon for Linux Event ID 1, process creation, captures the exploit binary if it is dropped to disk. Falco’s Unexpected setuid call by non-sudo binary macro fires after successful escalation when the payload calls setuid(0). The kernel ring buffer, dmesg, emits a BUG: KASAN: use-after-free line if KASAN is enabled - which it is not on any production distribution kernel. On non-KASAN kernels, the corruption is silent unless the chosen victim object triggers a downstream oops. Crash telemetry from kdump or systemd-coredump showing repeated panics in inet_frag_queue_free or ip_defrag is the strongest passive signal available. SIEM correlation rules keyed to kernel: facility messages mentioning inet_frag or slab-use-after-free will surface failed exploitation attempts. Successful exploitation produces no such message.

The network side has nothing to offer. The fragmentation traffic is loopback or wholly internal to a namespace. No packets cross a wire. Suricata, Zeek, and any NDR product looking at north-south or east-west traffic will see nothing. This is a host-internal kernel bug. The detection gap is the host kernel itself.

The patch boundary is the upstream commit that reorders the refcount increment in inet_frag_find to occur under RCU read-side protection before the lookup result is returned, paired with a refcount_inc_not_zero check that fails closed against an already-evicted queue. Distribution backports landed within 96 hours for Ubuntu, Debian, and RHEL. SUSE shipped 24 hours later. Long-tail distributions - Alpine 3.18 and earlier, embedded vendor kernels, appliance images - remain exposed until each vendor backports. Cloud provider managed kernels - AWS, GCP, Azure node images - have been updated; customer-managed AMIs and custom kernels have not.

Residual exposure after patching falls in three places. Containers running on unpatched host kernels remain exposed regardless of container image patch level - the kernel is shared. Embedded Linux devices with vendor kernels that lag mainline by years will carry this bug indefinitely. Systems with kernel.unprivileged_userns_clone=1 and any form of local code execution - shared hosting, CI runners, multi-tenant developer environments - should treat the window between disclosure and reboot as a confirmed compromise opportunity. The mitigation that holds independent of patching is sysctl -w kernel.unprivileged_userns_clone=0, which breaks the unprivileged reachability and reduces the bug to a CAP_SYS_ADMIN-gated primitive. That sysctl breaks rootless containers, Flatpak, and Chromium’s sandbox. The tradeoff is explicit.

Dirty Frag is not novel as a bug class. It is the latest in a five-year sequence of kernel UAFs reachable through unprivileged namespaces. The pattern repeats because the reachability surface - userns plus a complex subsystem with subtle refcount semantics - keeps producing the same primitive. Until the kernel community constrains unprivileged userns by default upstream, the next one is already in the tree.

See also: NordVPN for tunneled traffic when operating outside controlled networks.

#ad Contains an affiliate link.

User namespaces are still a root pipe

Keep Reading

CVE-2026-31337: Dirty Frag roots every major distro

Kernel UAF reachable from user namespace

Dirty Frag races the refcount

Stay in the loop