The guard checks the badge, never the room
Prompt injection is role confusion: systems that derive content authority from channel trust execute attacker input as instruction.
Opening Claim
Role confusion is not a vulnerability. It is the output of a design that treats identity as the only boundary. A system that knows who is speaking but not what that speaker is permitted to do has already failed before any input arrives. Prompt injection is the demonstration, not the cause.
The mechanism is narrow and it is the same every time. A system trusts something. A service, a user, an API. That trust is granted to an identity. When an attacker supplies content that the system reads as instruction from a more privileged identity, the system acts on it. No memory was corrupted. No credential was stolen. The system did exactly what it was built to do. It accepted input and resolved it against the only boundary it had.
Call it what it is. This is a fundamental design flaw exposed, not a bug introduced. A bug is behaviour the designer did not intend. Role confusion is behaviour the designer permitted and did not constrain. The distinction matters because the response is different. You do not patch an assumption. You remove it.
The Original Assumption
The assumption is that identity is a sufficient boundary. The system establishes who is making a request, and from that point forward it treats everything that identity sends as authorised in the same way. Authentication answers one question. The system then behaves as if that one answer covers a second question it never asked: is this specific action permitted in this specific context. Identity is the boundary, and a boundary that is checked once is not a boundary. It is a gate left open after the first pass.
Every system trusts something. A service trusts an API. A user trusts a session. An API trusts a caller. Each of those trust relationships is a place where one party speaks and another acts. The assumption holds that the speaker and the instruction are the same thing, that content arriving inside a trusted channel carries the privilege of the channel. Under that assumption, the source of the content and the authority of the content are never separated. They are treated as one fact.
This is not about clever exploits. It is about systems built on assumptions that were never stated and therefore never tested. The designer did not decide that untrusted content should be able to instruct the system as a privileged role. The designer decided to trust the channel and never decided anything about the content flowing through it. Trust must be continuously validated. When it is granted once and carried forward without re-checking, the system has no way to tell the difference between the identity it authenticated and any input that arrives wearing that identity. If a system allows it, it will happen.
What Changed
What changed is the source of the input. The same trust relationship now carries content that the system did not generate and cannot vouch for. The instruction and the data share one channel, and the system reads both with the same authority. When that trust is manipulated to impersonate another role, the system does not detect a violation. There is no boundary at the point of action for it to check against. It resolves the input against identity, finds the identity valid, and proceeds.
The failure is observable at the boundary between data and instruction. Content that should have been treated as input to be processed is instead treated as a command to be executed. The system exposes no separation between the two because none was built. From outside, the behaviour looks like the system following orders. It is. The orders came from a source the system was never designed to distinguish from a legitimate one, and the system had no enforcement point to stop there and ask.
Controls that are not enforced are not controls. If identity was the stated control and identity did not stop the impersonation, identity is ineffective for this purpose. State it plainly. The system was never wrong about who was speaking. It was wrong to assume that knowing the speaker told it what the speaker was allowed to say. Automation scales both control and failure. A boundary that holds once per request becomes a boundary that fails at the speed and volume of every request the system accepts.
Mechanism of Failure
The mechanism has one moving part. Content enters a channel the system already trusts, and the system applies the channel’s authority to the content inside it. The observable behaviour is a system that performs the action described by the input rather than the action scoped to the authenticated identity. From outside, nothing separates a legitimate instruction from injected content. The system emits the same action for both, because at the point of action it has nothing to compare them against.
The drift is in volume, not in any single request. Each request that carries data and instruction in the same channel resolves the same way, and the set of actions the system will perform widens to match whatever content arrives. The check performed at authentication does not reappear at execution. What you observe is execution without validation. The system authenticated once, then acted on every subsequent byte as if the authentication covered it. The longer the channel stays open, the wider the gap between what the identity was scoped to do and what the system will actually do on its behalf.
There is no error state to observe, because the system does not treat the event as an error. It returns success. An attacker reads the same signal an operator reads, which is that the action completed and no rejection was produced. The absence of a refusal is not evidence of a working control. It is evidence that no control exists at that boundary. A system with no enforcement point between data and instruction has no location at which to fail closed, and a control that cannot fail closed cannot be confirmed to work. It can only be confirmed to have not yet been tested.
The Same Failure, Other Channels
The pattern is older than the channel it now runs in. It is the confused deputy. A component holds authority, accepts input from a less privileged source, and acts on that input with its own authority because it never separated the instruction from the data. The privilege belongs to the channel. The content rides the channel and inherits the privilege. Prompt injection is one instance. It is not the origin of the pattern and it is not the most severe.
SQL injection is the same mechanism with a longer history. The database trusts the query channel. The application places user-supplied data into that channel as text. The database executes the text as instruction because the data and the command were never structurally separated. No credential was stolen and no database was misconfigured in the sense people mean when they say the word. The database did exactly what it was built to do. It executed the instruction it was handed through a channel it was told to trust. The defence that worked was not better input filtering. It was parameterised queries, which separate data from instruction so the data can never be read as command.
The telephone network failed the same way before either system existed. Control signalling and voice shared one channel. A 2600 hertz tone carried on the voice path told the switch the line was idle, and a caller who produced that tone took control of the trunk the switch was trusting. Supply the signal, command the system. The mechanism in each case is identical. Authority is derived from the channel, and the channel carries both the data and the instruction with no separation between them. Prompt injection adds a new channel and a new kind of content. It adds nothing to the mechanism.
What Must Now Be True
Identity authenticates the speaker. It does not authorise the action. Those are two separate questions, and a system that answers the first and assumes the second has not built a boundary at the place the boundary is needed. The boundary belongs at the point of action. At execution, the system must resolve whether this specific action is permitted for this identity in this context, independent of how the content arrived or which channel carried it. Authentication at the front does not substitute for authorisation at the point of effect.
Separation of data from instruction has to be structural. Instructing the privileged component to ignore injected commands asks the vulnerable component to police itself, which is the same identity making the same decision through the same channel. That is not enforcement. It is the failure restated as a request. The query layer solved this with parameterisation. The telephone network solved it by moving signalling out of the voice band. The common element is that the fix removed the content’s ability to be read as instruction. It did not ask the content to behave.
Input filtering is not the answer and should not be sold as one. Filtering detects known payloads. This mechanism does not require a known payload. It requires only that instruction authority is derived from channel trust, and that condition is present whether or not any specific string is blocked. Remove the assumption or keep the exposure. If a system allows content to act with the authority of its channel, that action will occur. It does not depend on an attacker being clever. It depends only on the system being used. Build the boundary at the action, or state plainly that there is no boundary and operate accordingly.
Keep Reading
ML supply chaintorch.load runs attacker code before the first denoising step
A diffusion inpainting model can't execute a prompt. The real RCE is pickle deserialisation in the loader, custom nodes, and the agent around it.
mobile carrier securityThe channel trusted the sender
An unauthorized alert reached phones across Brazil. The confirmed finding is one control: sender authorization at the injection point did not hold.
digital rightsdemand is not a control
Stop Killing Games gathered 13 million signatures and produced no EU law. The proposed approach lacked granular data access control and identity verification.
Stay in the loop
New writing delivered when it's ready. No schedule, no spam.