RC RANDOM CHAOS

memcpy walks off the end of the receiver

rsync shipped six CVEs in January 2025. LLMs did not write new bugs - they compressed variant discovery, harness generation, and vulnerable deployment.

· 6 min read

rsync shipped six CVEs in January 2025. Google’s Cloud Vulnerability Research team disclosed the cluster. CVE-2024-12084 is the headline - heap buffer overflow in the daemon’s checksum handling, CVSS 9.8, pre-authentication where anonymous read is configured. The rest: CVE-2024-12085 leaks four bytes of uninitialised stack memory per file via the file list exchange. CVE-2024-12086 lets a malicious server read arbitrary files from the connecting client. CVE-2024-12087 is a symlink-driven path traversal. CVE-2024-12088 bypasses —safe-links. CVE-2024-12747 is a race condition between symlink check and file open. Affected versions: rsync prior to 3.4.0.

This is the baseline. None of it is new bug-class territory. rsync’s protocol assumes mutual trust between sides that have never enforced mutual trust. The question this post answers - did Claude and similar LLMs make this worse. The short answer is yes, but not in the way the framing suggests. LLMs did not write new rsync bugs. They lowered the cost of finding variants of the existing ones, and they accelerated the deployment of rsync into places where the trust model breaks immediately.

Start with the heap overflow. CVE-2024-12084 lives in the receiver’s handling of the MAX_DIGEST_LEN constant. The protocol negotiates a per-block checksum length. When the negotiated length exceeds the fixed buffer compiled into the receiver, the memcpy of attacker-supplied checksum bytes runs past the buffer boundary. The primitive is a linear heap overflow with attacker-controlled size and contents, reached before authentication when the daemon exposes a readable module. Heap layout in rsync is allocator-default - glibc on most Linux distributions, jemalloc on some BSD builds. Exploitation work since disclosure has focused on adjacent chunk corruption to redirect a function pointer in a subsequent allocation. The receiver process runs as the user defined in rsyncd.conf, often nobody, occasionally root where the operator wanted to preserve ownership and did not understand the trade.

The blast radius is the receiver process. Where rsyncd is exposed on port 873 to the internet - Censys consistently indexes more than 600,000 hosts on that port - pre-auth code execution against any module configured with read-only anonymous access is the realised threat. MITRE T1190, exploit public-facing application. Post-compromise the receiver is a foothold inside whatever segment the daemon runs in. Often that segment includes backup storage, build artefacts, or replica sets. The pivot value is high.

The information leak - CVE-2024-12085 - is the precursor that makes the heap overflow practical. The file list exchange initialises a sum2 buffer to a size derived from the negotiated checksum length, but the comparison loop reads from the local copy before the full length is written. Four bytes of uninitialised stack leak per file. Iterate the file list against a module with thousands of entries and the attacker reconstructs stack canaries, return addresses, and library base pointers. ASLR bypass on the receiver becomes a function of how many files the attacker can enumerate. Combined with the overflow, the chain is leak-then-corrupt against a single process within a single connection.

Now the LLM amplification claim. There are three distinct mechanisms.

First, variant discovery. The rsync codebase is mid-size C with decades of protocol-handling code written before modern compiler hardening. The patches for the January 2025 cluster touched roughly 200 lines across token.c, sender.c, receiver.c, and util2.c. Claude, GPT-class models, and Gemini are now competent at reading a patch diff and producing a list of structurally similar code paths in the same module. That is the workflow vulnerability researchers were already using grep and Semgrep for. The LLM compresses the time from patch publication to variant hypothesis from days to hours. Public discussion since January shows variant hunting has continued - most candidates do not reach exploitable, but the tail of the distribution matters. Each new variant restarts the disclosure clock.

Second, harness generation. AFL++ and libFuzzer harnesses for rsync’s protocol parsers used to require non-trivial setup. The protocol is stateful, the parsers consume framed input, and the negotiation phase must be replayed correctly before the interesting code paths are reachable. LLMs now generate functional harnesses for stateful protocol parsers in a single prompt. The harness quality is mediocre, but the coverage threshold to find the next memory-safety bug in rsync is low because the historical bug density is high. Lowering the cost of harness construction raises the rate of disclosure. This is a defensive net positive when researchers report. It is a defensive net negative when they do not.

Third, deployment proliferation. This is the underreported failure mode. Operators ask Claude how to set up a backup. The model produces a rsyncd.conf with read-only anonymous access enabled, because that is the simplest working example in the training data. Operators ask how to expose a sync endpoint to a remote site and the model produces a configuration with no IP allow-list. Vibe-coded backup scripts now expose rsync daemons on residential, SMB, and cloud-VM addresses that would never have been deployed at this scale before. Shadowserver and Censys data through Q1 2026 show the rsync 873 surface increasing rather than shrinking against a backdrop of public CVE coverage. The patches landed. The deployments did not.

Real-world exploitation status. CVE-2024-12084 has been the subject of multiple PoC publications since January 2025. Mass scanning on 873 spiked within 72 hours of disclosure and has not returned to baseline. CISA added the rsync cluster to known-exploited tracking. Confirmed in-the-wild use against unpatched daemons has been reported by multiple incident response firms - credentialled details remain limited but the pattern is consistent. ShinyHunters-adjacent activity has used rsync footholds for repository exfiltration where SSH-based rsync over harvested keys is the access vector rather than the daemon overflow. Both paths converge on the same end state - bulk file egress over a protocol that backup tooling already permits at network egress.

What this looks like in telemetry. Sysmon Event ID 3 captures the inbound 873 connection. Most environments do not alert on it because rsync is a sanctioned tool. Sysmon Event ID 1 fires on rsync process spawn under the configured user. If the receiver is exploited, the post-exploitation process tree forks from rsync rather than from sshd or a shell. That is the anomaly. EDR vendors that baseline rsync as benign will miss it. Falco rules covering unexpected child processes from network daemons catch this class of event when the rule set includes rsyncd. Most do not by default. Network telemetry on 873 shows protocol framing that no L7 firewall parses meaningfully. Zeek has a partial rsync analyser. Most SIEMs ingest the connection metadata only. The blind spot is the protocol payload itself.

For SSH-tunnelled rsync the picture is worse. The traffic is encrypted, the authentication is keypair, and the rsync invocation is one of many command strings sshd accepts. Detection moves to host telemetry on the server side. Sysmon 1 on rsync execution with the —server flag and an unusual source IP in the parent ssh session is the detectable signal. The correlation is non-trivial and rarely pre-built. Where the SSH key was harvested from a developer workstation, the source IP looks legitimate and the signal collapses to behavioural - file volume, file types, time of day.

The patch boundary is rsync 3.4.0 for the January 2025 cluster. The residual exposure post-patch is the part the advisories did not cover. The protocol’s trust model still treats both sides as cooperating peers. A malicious server can still influence a connecting client through choices the protocol permits - file list ordering, path encoding, attribute handling. The —safe-links and —munge-links flags exist because the protocol allows the unsafe states they mitigate. Operators who patched but did not review module configuration, allow-lists, and user mappings retained the deployment-level exposure that the heap overflow merely accelerated.

The LLM contribution is not a new CVE. It is throughput. Faster variant discovery, faster harness generation, faster deployment of vulnerable defaults at higher volume. The bug class is unchanged. The blast radius is the same. The probability of encountering an exposed, misconfigured rsync daemon on a given network in 2026 is higher than it was in 2024, and the patch state of the install base is worse because the deployment rate exceeded the maintenance rate. That is the operational reality the advisories will not state.


Contains a referral link.

Share

Keep Reading

Stay in the loop

New writing delivered when it's ready. No schedule, no spam.