Trust does not carry forward

Two language models were given the same retrieval tasks. One, GPT55, returned unsupported content at roughly three times the rate of the other, an MIT-licensed model designated GLM52. The prompts were identical. The sources referenced were identical. The divergence appeared in the output, and it was stable enough to be measured rather than reported as an anecdote.

What was observed was not a single wrong answer. It was a rate. A rate is a property of the system, not of the question. Across a large sample, GPT55 produced text that did not correspond to the referenced material three times as often as GLM52 did. The system did not refuse. It did not flag uncertainty. It returned content with the same form and the same confidence whether or not the content matched the source it named.

Nothing broke. No error was raised. Each model performed exactly the operation it was constructed to perform: receive a prompt, resolve the references that prompt implies, and produce a fluent continuation bound to them. The output that observers later called a hallucination was, from the system’s position, a successful completion. The measured gap between the two models is the gap between two systems doing the same thing, one of them resolving references it could no longer stand behind more readily than the other.

The assumption was that a source, once referenced, carries its trust forward. The model is simple and rarely stated. A reference is a proxy for the content behind it. When a prompt names a source, or when training has bound a claim to an authority, the system treats the name as sufficient. The reference stands in for verification. The content is assumed to match the reference because at some earlier point it did.

That assumption has three parts. Persistence: trust granted to a source at one time remains valid at later times. Transferability: trust in a source extends to any content associated with that source. Validity: the binding between a reference and the content it points to holds across versions, revisions, and contexts. The system does not test these properties. It assumes them, because testing them at the moment of retrieval would require the system to perform the work the reference was meant to replace.

This is the economy of the design. A reference exists so that the content does not have to be revalidated on every use. The proxy is cheaper than the reality. Training compresses sources into weights. Retrieval resolves a pointer rather than re-deriving a fact. The system optimized for resolution and fluency, and trust was stored as a property rather than measured as a live condition. As long as references stayed bound to their content, the assumption held and the proxy was indistinguishable from the truth.

It did not start this way. What changed was not the model and not the question. What changed was the validity of the assumption. The sources degraded. Documents were revised. Authorities that were once reliable accumulated error. Reference targets moved, were rewritten, or were themselves polluted by content earlier systems had generated. The reference still resolved. The name still pointed somewhere. What it pointed to was no longer what the trust had been granted against.

The system did not re-evaluate. At retrieval time it inherited trust from a past state, a state in which the reference and the content were aligned, and it treated that prior validity as a present one. Nothing in the operation checks whether the binding still holds. The operation resolves the reference and continues. So the degradation passed through without resistance. The output kept the same form, the same fluency, the same confidence. The thing that changed sat underneath the reference, and the system does not look underneath the reference.

The threefold difference between the two models is a difference in how aggressively each resolves a reference it can no longer validate. The error did not disappear. It moved into the space between the reference and its content. Neither system was told the source had drifted. Both inherited trust. One inherited more of it. This is systemic drift, not a technological anomaly. The assumption that made the system efficient quietly stopped being true, and the system, holding no mechanism to notice, continued to behave as designed.

The mechanism is substitution. At retrieval the operation has a reference, and it has whatever content that reference now names, and it acts on the first. The name of the source becomes the warrant for the output. The integrity of the content behind the name is never separately established, because establishing it is the cost the reference was created to avoid. What an external observer reads as a statement about the world is, in the behavior of the system, a statement about a pointer that resolved.

This is where identity displaces integrity. The output corresponds to what the named source is trusted to contain, not to what it presently contains. Both models produce a fluent continuation consistent with the source’s standing, and both hold that form whether or not the standing still matches the material. There is no observable difference in the result between a completion drawn from content that still aligns with its reference and one drawn from content that has drifted from it. The confidence is uniform. The shape is uniform. The surface carries no marker of the gap underneath it, because the surface was generated from the reference, and the reference is intact.

No exception path is taken, because none is reached. The system raises nothing, withholds nothing, and qualifies nothing, since from its position the operation completed as constructed. A bypass would imply a control that was evaded; here there is no control to evade. The output is expected behavior executed correctly against an assumption that has quietly expired. The threefold gap between the two models is not a defect in one and its absence in the other. It is a difference in how readily each resolves a reference whose binding it cannot, and does not, test. Both are doing the same thing. One does more of it.

The pattern beneath this is single and plain. A system executes on a reference instead of verifying the thing the reference stands for. The reference is cheaper than the verification, so the reference becomes the basis of action. Trust is established at one moment and then carried, unexamined, into every later moment the reference is used. The binding between the pointer and the content is assumed permanent because re-testing it would reproduce the work the pointer was meant to eliminate. The failure is not that the reference resolves wrongly. The failure is that resolving is treated as equivalent to verifying.

The same mechanism governs how machines establish trust across a network. A client accepts a certificate because the certificate chains, through valid signatures, to a root it already trusts. The chain resolves. At the moment of issuance the binding between that certificate and the entity holding it was sound, and the signature preserves the record of that soundness. The client validates the chain, not the present integrity of the key behind it. It confirms that the reference resolves to a trusted origin. It does not re-derive whether the origin’s trust still holds.

When the private key is later compromised, nothing in the resolved chain changes. The signature still verifies. The reference still points to a trusted root. The one mechanism built to revalidate, revocation, is the part of the system that fails soft: when the check is slow, unreachable, or unanswered, the connection proceeds rather than stops. So the client executes the expected behavior. It trusts a certificate whose underlying integrity has moved out from under it, for the same reason the model returns a claim its source no longer supports. The system resolved a reference and treated resolution as proof.

The system resolves the reference once. It does not return to ask whether the reference still describes what it pointed to.

Trust granted at one moment is spent at every later one, unmeasured, until the content has moved and the reference has not. Nothing in the operation marks that moment. The output looks the same on either side of it.

This is not a flaw waiting for a patch. It is the design, behaving as designed, against an assumption that no longer holds. The reference exists. The integrity does not.

Trust does not carry forward

Keep Reading

The machine quoted an EFF staffer who never existed

Six thousand fuel gauges answer every stranger

A handle, a token, a SYSTEM shell

Stay in the loop