DeepSeek dodged the Entity List, not your pipeline

The US Commerce Department’s Bureau of Industry and Security added more than 100 firms to its restricted-entity rolls this cycle and stopped short of naming DeepSeek. The restraint is the signal, not the omission. A model does not need a place on the Entity List to sit inside a security pipeline as a trusted dependency, and the dependency is the exposure.

DeepSeek is being treated as a single-vendor question. It is not. DeepSeek-R1 and DeepSeek-V3 weights are mirrored across Hugging Face, distilled into thousands of derivative checkpoints, quantised into GGUF builds, and embedded in tooling that analysts now run for alert triage, log summarisation, and threat-intel enrichment. The model file is not the asset at risk. The trust placed in its output is.

The bug class is supply chain compromise of the model artifact and its inference path. MITRE ATLAS catalogues it directly. AML.T0010 is ML supply chain compromise. AML.T0020 is training-data poisoning. AML.T0018 is model manipulation through an embedded backdoor. None of these require breaking the model at runtime. They require controlling what the model learned, or what its loader executes.

Training-data poisoning operates earliest in the chain. Foundation models ingest scraped web text, code, and fine-tuning sets that no one audits at scale. A small fraction of crafted samples installs a backdoor that activates on a rare trigger token while leaving benchmark accuracy intact. Published poisoning research puts the required poison fraction well under one percent of a fine-tune set, and the inserted behaviour survives downstream distillation. AML.T0020 covers the technique. The defender inherits the poison without ever touching the training run, because the model arrives pre-trained.

Model weights are not inert data. The dominant distribution format for years was Python pickle, serialised inside .bin and .pt files. Pickle is not a data format. It is a bytecode stream with a reduce method that executes arbitrary Python on deserialisation. Loading a pickled checkpoint is code execution by design. That maps to T1059, command and scripting interpreter, triggered by the act of loading a file a defender assumed was data. safetensors exists specifically to remove this primitive. Most pipelines still accept both formats.

The loader is the second surface. CVE-2024-3660, the Keras Lambda layer flaw, allowed arbitrary code execution when a model deserialised a crafted Lambda layer, rated critical. CVE-2025-1550 extended the same class to the .keras archive format. CVE-2024-37032, Probllama, was a path traversal in Ollama’s model-pull endpoint that reached remote code execution through a manipulated manifest. The pattern holds across each. The parsing and loading code trusts the model file, and the model file is attacker-controlled the moment it comes from an untrusted registry.

Quantised distribution adds parsing surface of its own. GGUF and the earlier GGML files are parsed by llama.cpp, which has carried heap-overflow and out-of-bounds read flaws in its tensor and metadata parsing, reachable through a malformed model file. The format that makes a model small enough to run on a local workstation is also an untrusted binary parsed in C++. A SOC analyst pulling a quantised DeepSeek derivative to run offline is loading attacker-reachable input into a memory-unsafe parser.

Registry trust compounds the problem. Model hubs resolve names, not provenance. A checkpoint published under a plausible namespace, or a typosquat of a popular repository, gets pulled by automation that matches on string, not signature. This is dependency confusion applied to weights, the same failure that has compromised package ecosystems, moved to a registry where the artifact is hundreds of gigabytes and no analyst reads it end to end. T1195.001, compromise of software dependencies, fits without modification.

The exploit path does not start with a zero-day. It starts with trust. An attacker publishes a checkpoint, or compromises an existing one, on a public registry. The name resembles a legitimate model. The weights behave normally on benchmark prompts. The malicious behaviour is conditional. JFrog documented roughly 100 malicious models on Hugging Face carrying pickle payloads that opened reverse shells on load. ReversingLabs documented nullifAI, models using deliberately broken pickle streams to slip past Picklescan while still executing. The delivery vehicle is the model. The trigger is the load. T1195, supply chain compromise, and T1199, trusted relationship, describe the access, because the victim pulled the artifact themselves.

Code execution on load is the loud failure. The quiet one is output manipulation. A poisoned or backdoored model returns valid-looking content that is wrong on a trigger the attacker chose. AML.T0051, prompt injection, weaponises this through data the model ingests at inference, a log line, a ticket, a packet-capture summary containing instructions the model follows. For a SOC running an LLM over its own telemetry, the input is the alert stream, and the alert stream is reachable by anyone who can generate an event. A model instructed, inside ingested data, to score a specific indicator as benign will do exactly that. The compromise produces no crash, no exception, no anomalous binary. It produces a confident, plausible, wrong verdict.

The consumption side amplifies the manipulation. Model output increasingly drives automated action, not just analyst reading. An LLM verdict that scores an indicator benign can suppress a SOAR playbook, drop an enrichment in MISP, or downgrade an alert before a human ever sees it. The further the output sits from human review, the more directly a manipulated token becomes a control decision. A poisoned threat-intel feed at least carries provenance and a source rating. A poisoned model carries neither, and its output inherits the authority of the pipeline that called it.

Real-world exposure is not theoretical. Wiz Research found a publicly accessible DeepSeek ClickHouse database in January 2025, no authentication, over a million log lines, plaintext chat history, API keys, and backend operational detail. NowSecure analysed the DeepSeek iOS application and found App Transport Security disabled, data transmitted without encryption, hardcoded keys, and Triple DES with a static key, with device and network identifiers routed to infrastructure operated by ByteDance. Oligo Security tracked ShadowRay, active exploitation of CVE-2023-48022 in Ray clusters, where attackers hijacked AI compute for cryptomining and reached the data and credentials those jobs held. These are reported incidents against the same class of infrastructure that security teams are wiring into their own workflows.

Telemetry is where the gap becomes operational. Code execution on model load is visible. A Python or inference worker process spawning a shell shows in Sysmon Event ID 1, process creation. An outbound connection from that worker to an unexpected host shows in Sysmon Event ID 3, network connection. Egress to a registry mirror or a paste service is catchable in SIEM correlation against a known-host allow-list. EDR alert categories for living-off-the-land and reverse-shell behaviour fire normally, because the post-exploitation looks like ordinary post-exploitation.

Output manipulation fires nothing. There is no event ID for a model that summarised a breach as routine. No Sysmon record for a poisoned weight returning a clean verdict on a malicious hash. No network IOC, because the inference call is the expected traffic and the response is well-formed. The host sees a process reading weights and returning tokens. The semantic compromise of the answer is invisible to every sensor tuned for binaries, processes, and connections. Detection coverage for the AI supply chain stops at the loader. It does not extend to the meaning of the output.

Detection that exists for this class is built deliberately, not inherited. Canary prompts with known-correct answers, run on a schedule against the production model, surface drift and triggered manipulation that passive sensors miss. Output validation against a second model or a deterministic rule catches a fraction of manipulated verdicts. Logging the full prompt, retrieved context, and response for every inference call produces an audit trail host EDR never generates. None of this ships by default. The instrumentation is additive, and most pipelines run without it.

The Entity List action defines the boundary precisely. A listing restricts US firms from supplying the named entity. It does not reach into deployed environments, delete mirrored weights, or revoke a checkpoint already pulled into a build. Residual exposure after any DeepSeek designation includes every cached model, every distilled derivative, every quantised GGUF, and every pipeline already trusting that output. The control is export restriction. It is not removal.

What still applies post-listing is structural. A model from an untrusted source is a dependency with code execution on load and unverified influence over output. The same discipline that governs any third-party binary applies here: provenance verification, signature and hash pinning, safetensors-only loading, network isolation of inference workers, and treating model output as untrusted input rather than ground truth. For operators under the SOCI Act, an AI component feeding decisions in a critical-infrastructure SOC is part of the regulated asset, and its data handling falls under Privacy Act obligations the moment it touches personal information.

DeepSeek being held off the list does not lower the exposure. It clarifies it. The risk was never one model on one list. It is the standing assumption that a third-party model’s output can be trusted inside a pipeline that defends anything. Where active manipulation is suspected in a production environment, the path is escalation to the responsible security team and incident response, not local triage.

DeepSeek dodged the Entity List, not your pipeline

Keep Reading

MITRE already filed your detection bypass as AML.T0015

ScStoragePathFromUrl overflows the stack on PROPFIND

Contagious Interview ends at npm install

Stay in the loop