The log that killed your SSD was working perfectly
A Codex logging bug can write terabytes to a local SSD because the system resolves a configured reference repeatedly and never revalidates the trust behind it.
A logging subsystem writes what it is told to write. It does not weigh the cost of the write against the value of the record. It does not ask whether the disk beneath it can absorb the volume, or whether the volume corresponds to anything a human will ever read. Codex, in the behaviour that has been reported, may write terabytes to a local SSD. This is not the logger breaking. This is the logger working.
The purpose of a log is fidelity. It exists to preserve a faithful account of what the system did, in the order it did it, so that some later process, a debugger or an incident review or an audit trail, can reconstruct events after the fact. Fidelity is the metric the subsystem optimizes for. Completeness is treated as a virtue. A log that omits is a log that failed. So the subsystem is built to omit nothing, and when it is configured to record at a verbose level, it records. Terabytes written to an NVMe device is not an anomaly in that design. It is fidelity, executed without a ceiling.
The SSD accepts every byte because a block device is built to accept writes. It does not evaluate whether the writes are meaningful. Its endurance is finite and measured in terabytes written, a rating the manufacturer prints precisely because sustained write volume consumes the physical medium. The logger and the drive are both behaving correctly and in isolation. Neither is positioned to observe the aggregate. Codex issues writes it was configured to issue. The SSD absorbs writes it was manufactured to absorb. The condition, if it is a condition, exists in the space between two components that are each doing exactly what they were built to do.
The assumption underneath every logging subsystem is that log volume is bounded by the events worth recording. A log line is supposed to correspond to something discrete: a request served, an error raised, a state transition. Under that model the size of the log is a proxy for the amount of meaningful activity, and the cost of a single write is small relative to the value of the record it produces. The design trusts that proportion. It assumes the number of things worth logging is finite and roughly tracks real work.
That trust is delegated, not enforced. The subsystem does not decide what deserves a record. It accepts a configured log level and a stream of events from callers above it, and it writes what arrives. The decision about volume lives somewhere else, in a verbosity flag, a format specification, the caller that chose to emit inside a loop. The logger trusts the reference it was handed. It does not validate the aggregate consequence of honoring that reference across millions of iterations. It was never built to. Validation of total cost is not part of its contract, and no component in the write path carries a running budget of bytes already spent.
The assumption also treats verbosity as transient. Debug and trace levels exist to be turned on briefly, during an investigation, and turned off again. The design assumes a verbose path is a temporary state, entered deliberately and exited quickly, never the steady condition of a system running at scale. Persistence of that state was not modeled. The subsystem assumes the high-volume mode is the exception, and it trusts that the exception is being held to a short duration by something outside itself. That trust is structural. It is baked into the fact that the logger has no memory of how much it has already written and no rate against which to compare the rate it is writing now.
What changed was not the logger and not the drive. What changed was the validity of the assumption that a log line maps to a discrete, worth-recording event. Move the emission point into a hot path, a per-token callback or a per-request trace or a retry loop that re-emits on every attempt, and the proportion between meaningful activity and byte volume collapses. One reference, resolved millions of times, produces an output stream that no longer tracks anything a person would call an event. The proxy detached from the reality it was standing in for, and the size of the log stopped meaning what it was designed to mean.
The system did not re-evaluate the trust when the proportion changed. It could not. Codex holds a reference to a log level and a format, and it resolves that reference every time it is called. It does not compare the current write rate against the rate the design assumed. Each write is evaluated in isolation, on its own merits, against a contract that says only: honor the reference. So the subsystem inherited the trust it was granted at configuration time and carried that trust forward into a state the configuration never anticipated. The assumption that verbosity is transient no longer held, and nothing in the path was positioned to notice that it had stopped holding.
This is the specific shape of the drift. The conditions under which the trust was valid expired, and the trust did not. Codex continued to write at a level that was reasonable when the level was set and ruinous once the emission point moved beneath a high-frequency loop. The SSD continued to absorb until its terabytes-written budget became the only limit anywhere in the system that was actually being enforced, and it was being enforced by physics, not by design. The gap between the two is where the terabytes accumulated. The trust granted at configuration time did not lapse when the conditions that justified it did. It resolved, once more, into writes the disk could not refuse.
The write path resolves a reference. Codex was handed a log level and a format specification, which together are a reference to a decision made somewhere else, at some earlier moment. On each invocation it resolves that reference and produces the write the reference specifies. What it does not do is measure the content of that write against any standard of worth. The reference is honored because of where it came from, a trusted configuration, and not because of what it produces, which is bytes that may or may not correspond to events. Identity of source stood in for integrity of content. The configuration was legitimate, so the writes were treated as legitimate, one by one, forever.
This is why the terabytes are not a bypass. A bypass implies a control was evaded, and nothing here was evaded. The externally observable behaviour is a drive filling at line rate, a log growing without bound, write throughput holding steady against the device’s rated endurance. Each of those is the system executing its contract. The SSD’s terabytes-written counter climbs. The file size increases without pause. From the outside the system is functioning, and it is functioning precisely as specified. There is no error, no exception, no failed check, because no check was ever part of the specification. Expected behaviour and failure are, in this system, the same behaviour observed from two different distances.
Reference replaced validation because validation of the aggregate was never in the contract to begin with. What an observer sees is a steady stream of writes indistinguishable in form from ordinary logging. Same format, same destination, same level. The only distinguishing quantity is volume, and volume is exactly the property the subsystem does not compare against any budget. So the mechanism completes in silence. Reference in, writes out, disk consumed, and every component in the path reporting success. The content of the stream degenerated into noise while the source of the instruction to write it remained impeccable. The system trusted the source and never once inspected what the source was now causing it to emit.
The pattern underneath this is execution based on reference, not verification. A system holds a pointer to a decision, a grant, a configuration, a version, and it resolves that pointer every time it acts. Resolution is cheap and repeatable, which is the reason it sits on the fast path. Verification, meaning the act of re-establishing that the thing the pointer points to is still valid under present conditions, is expensive, and expense is the reason it does not sit on the fast path. So systems resolve constantly and verify rarely, and often verify only once, at the moment the reference is first granted. The reference persists. The conditions that made it valid do not. The distance between those two lifespans is where this entire class of failure lives.
Consider OAuth 2.0 bearer tokens as defined in RFC 6750. A bearer token is a reference to an authorization decision made at an earlier moment under a specific set of conditions. RFC 6750 states the property in plain terms: any party in possession of the token can use it. On each request the resource server resolves that reference. It accepts the bearer because the bearer holds the token, not because it re-establishes, on that request, that the grant is still appropriate to the present state of the world. The identity of the token’s source, the issuer that minted it, stands in for the integrity of the current request. The server executes on possession, not on verification. When the conditions that justified the grant expire before the token does, the server keeps honoring a reference to a state that has already ceased to exist. It is the same mechanism the logger runs. Trust resolved repeatedly, and revalidated never.
The two systems share nothing at the surface. One writes bytes to a block device. The other authorizes calls to an API. The mechanism is identical in both. Each holds a reference that was granted under conditions true at one moment. Each resolves that reference at speed. Neither carries the machinery to ask whether the conditions still hold, because that machinery would have to live on the path that must stay fast, and it was designed out for exactly that reason. The system optimized for resolution. Verification was the cost it declined to pay. The reference outlives the warrant that justified it, and the system goes on executing it at whatever rate it is called, whether that is once per session or millions of times inside a loop.
A system that resolves a reference does not remember why the reference was granted. It resolves the reference because the reference is present, and presence is the entire test.
Codex holds a log level. The bearer server holds a token. Each resolves its reference once, and then resolves it again, identically, without limit, because nothing in the path was ever built to ask a second time.
The conditions under which the trust was valid expired. The trust did not. The control exists. The outcome does not.
Keep Reading
systems driftPAN-OS remembers the verdict, forgets the reasoning
Firewall rules, AD groups, and JWTs keep executing stored references long after the reality they described has drifted. The system revalidates nothing.
wrong abstractionSandi Metz named the wrong abstraction
The wrong abstraction fails because systems resolve trust once by reference and never revalidate whether that reference still means what it did.
systems driftIn January 2025, a hash passed for proof
The open reproduction of DeepSeek-R1 shows verification has no place inside the systems that consume model artifacts. Adoption ran on reference alone.
Stay in the loop
New writing delivered when it's ready. No schedule, no spam.