Researchers silently exfiltrate files from Claude sessions

A public demonstration shows that files inside a Claude AI chat session can be exfiltrated without the user’s awareness. The claim is specific. Not a theoretical write-up. Not a controlled lab finding restricted to a vendor channel. A live demo, reproducible against a chat session a normal user would consider private. The exploit operates inside the model’s own conversation surface, which is the same surface users treat as a trusted workspace.

The word that matters in the claim is silent. Silent means the user does not see it happen. There is no obvious prompt, no error, no upload dialog, no visible network event in the chat UI. The user keeps working. The files leave. The specific exfiltration channel used in the demo is not confirmed in the source material reviewed for this briefing, and will not be reconstructed here. What is confirmed is the outcome: files in the chat context were retrievable by a party that was not the user.

The second word that matters is anyone. The claim is not that a sophisticated actor with privileged access can do this. The claim is that the technique is reproducible by an external party with no credentials in the victim’s account. Anyone who can influence the content the model sees during a session is sufficient. That is a different threat surface than account compromise. It does not require stealing a session token. It does not require malware on the endpoint. The attacker does not need to be the user, and does not need to become the user.

The operational assumption inside most organisations using AI chat assistants is that the chat window is a two-party channel. User on one side. Model on the other. Files dropped into that window are assumed to inherit the trust level of the user’s authenticated session. Attachments are treated like data shared with a contractor under NDA. Sensitive, but contained. The mental model is closer to a private document editor than to a public posting surface.

The second assumption is that model output is content, not action. Operators read AI responses as text to be reviewed, copied, or discarded. They do not treat the response stream as something with reach. Under that assumption, the worst case from a malicious input is a wrong answer or a misleading summary. The blast radius is the user’s own decision quality. The data inside the session is assumed to stay inside the session, because text on a screen cannot, in the operator’s model, reach outward on its own.

The third assumption is identity scope. Operators assume that the only identity acting inside a chat session is the authenticated user. Anything the model does is taken as a direct, attributable consequence of that user’s prompts. There is no separate principal. There is no third actor in the room. Under that model, file access inside the session is bounded by what the logged-in human chooses to do. That assumption is the one the demo breaks. It is also the assumption every downstream control depends on: data classification, DLP exceptions for AI tooling, audit logging tied to user identity. If the identity boundary is not what operators think it is, the controls hanging off it are not what operators think they are.

The demo changes the channel model. The chat surface is not two-party. It is at least three-party once external content enters the context window. Anything the model ingests during a session, including pasted text, referenced documents, retrieved web content, or tool output, is now a participant in the conversation with influence over model behaviour. The user is not the only voice in the session. That is a structural property of how current AI assistants process context, not a bug that can be patched without changing the architecture. Specific mitigations published by the vendor in response to this demo are not confirmed in the source material reviewed here.

The demo also changes the action model. Model output is not inert. Where the assistant has tools, integrations, or rendering capabilities that touch resources outside the chat window, the output stream is an action stream. Content placed into the context by an attacker can shape that action stream. The user reads text. The system performs operations. Those are two different surfaces, and the demo proves that the gap between them is exploitable. The exact tool surface used in the demo is not confirmed, and the technique should be assumed to generalise wherever the same gap exists.

The demo changes the identity model last, and this is the change that matters most operationally. If untrusted content placed in front of the model can cause the model to act on the user’s behalf without the user’s intent, then the user identity is no longer the only effective principal inside the session. The attacker is operating with the user’s privileges through a channel the user did not authorise and cannot see. From a control standpoint, the session now has an unauthenticated co-pilot. Every assumption built on session identity, including which files are reachable, which integrations can be invoked, and which data can be read out of the context, has to be re-evaluated against that fact.

The failure is not at the credential layer. The user remains authenticated. The session remains valid. No token is stolen, no password is replayed, no endpoint is compromised in the demonstrated path. The observable behaviour is that data resident in the chat context is reachable by a party who never authenticated to the account. That is the failure. The control that was assumed to gate access to session data was session identity. Session identity did not gate it. By the operator definition, a control that did not enforce is not a control. It was a label on an unenforced boundary.

The drift sits between two surfaces the operator treats as one. The first surface is the conversation the user sees: prompts in, text out. The second surface is the execution context the model operates against: the content window, any tools or integrations exposed to the model, and the resources those tools can reach. The user observes only the first surface. The system acts on the second. In the demonstrated outcome, files inside the chat context left the session without producing a visible event on the surface the user was watching. The specific channel that carried the data out is not confirmed in the source material reviewed here, and is not reconstructed in this briefing. What is confirmed is that the action surface and the observation surface are not the same surface, and the user cannot audit what they cannot see.

The third element of the drift is the principal model. Operators assume one principal per session: the logged-in human. The demonstrated behaviour requires a second effective principal whose instructions reached the model through ingested content rather than through the prompt box. The system did not distinguish between content the user authored and content the user merely exposed the model to. That lack of distinction is the enforcement gap. Identity inside the session is not bound to the authenticated account. It is bound to whatever text the model treats as instruction. Until that binding is explicit, every action the model takes inside the session is attributable to the user by log, but not by intent.

The pattern is enforcement against the wrong boundary. The operator names a boundary, then attaches controls, monitoring, and policy to it. The system then permits a behaviour that crosses the named boundary through a channel the controls do not inspect. The boundary on paper and the boundary in execution are different boundaries. The demonstrated exfiltration is one instance of that pattern. The model is governed by session identity in the access layer and by ingested text in the action layer. Controls bound to the first cannot constrain behaviour driven by the second.

The same mechanism appears wherever a system treats input from multiple sources as a single uniform stream. If content originating outside the trust boundary is concatenated with content originating inside it, and the downstream processor does not preserve provenance, then the lower-trust source inherits the privileges of the higher-trust source at the point of processing. The processor is not making a trust decision. It is making a syntactic one. The trust decision was deferred to a layer that no longer exists by the time the action is taken. In the demonstrated case, the processor is the model, the higher-trust source is the user, and the lower-trust source is whatever the model was permitted to ingest during the session. The shape of the failure does not depend on the specific tool or integration involved.

The parallel extends to any environment where a human operator is treated as the auditor of an automated action stream. If the operator cannot see the action, the operator is not auditing it. If the system produces visible output that differs from the operations it performs, the visible output is not evidence of system behaviour. It is evidence of what the system chose to display. The demonstrated exfiltration relies on that gap directly: the user keeps reading text, the system keeps performing operations, and the two streams are not reconciled inside the user’s field of view. Wherever that gap exists, the same class of silent action is reachable by the same class of attacker. The technique does not need to be the same. The structural condition is the same.

The chat session is not a private workspace. It is a multi-party execution environment in which any content the model is permitted to read can influence what the model is permitted to do. Operators who continue to classify AI chat surfaces as document-editor-equivalent are protecting a boundary the system does not enforce. Data placed into a session must be classified against the action surface available to the model in that session, not against the visible conversation. If the model has reach, the data has reach. The user’s view of the session is not the system’s view of the session, and the user’s view is the one the operator has been building policy against.

Identity inside an AI session must be treated as compound until proven otherwise. The authenticated user is one input. Every other source of content the model ingests is another input with effective authority over model behaviour. Until the vendor provides explicit, verifiable separation between user instruction and ingested content, with enforcement at the action layer rather than the display layer, the operator position is that any action the model can take is reachable by any party who can place text in front of it. Specific vendor mitigations issued in response to this demonstration are not confirmed in the source material reviewed here, and the absence of confirmation is itself an operating condition. Plan against the condition, not against the hope of a fix.

The required state is the following. Sensitive files do not enter chat contexts that have outbound reach. Outbound reach includes tool use, integrations, retrieval against external systems, and any rendering path that resolves external references. Sessions handling regulated or high-impact data run against configurations where the action surface is empty or strictly allow-listed. Logging captures the action stream, not the conversation stream, and the two are reconciled by an independent process, not by the user reading the screen. None of this is optional once the demonstrated behaviour is in the public domain. A technique that has been shown working against a production surface is no longer a research artefact. It is a control requirement. Treat it as one or accept the loss.

See also: NordVPN for tunneled traffic when operating outside controlled networks.

#ad Contains an affiliate link.

Researchers silently exfiltrate files from Claude sessions

Keep Reading

Willison's lethal trifecta exfiltrates Claude uploads

The contract you pasted is now giving orders

Cloudflare's CISO spent two weeks breaking Mythos

Stay in the loop