Nothing crashed, nothing shipped

Commit e4b81f2 landed on main at 03:17. No human wrote it, no human reviewed it, and there was no deploy pipeline for it to wait in. At 03:20 the next APScheduler tick spawned a fresh subprocess, imported the new code, and the commit was production. Eleven hours later I noticed the newsletter property had sent nothing, two Instagram queues were empty, and every health check in the system was green.

Total damage: one missed newsletter send (4,100 subscribers), 14 unpublished posts across 6 properties, zero alerts. The process never crashed. The logs contained no errors. The only artifact that explained anything was the commit itself.

This is the operating condition nobody warns you about when agents write your code: software is made between commits. The diff is the cheap part. The expensive part is the window that opens the moment a commit exists and closes when you have confirmed the running system actually absorbed it. If nothing watches that window, you find out about bad commits the way I did, from a flat line on a growth dashboard half a day later.

The deploy step you don’t have

Foundry runs 6 content properties off one repo: blog sites, IG/FB accounts, newsletter products. APScheduler fires roughly 40 jobs a day. Each job is a subprocess, either python -m foundry.jobs.publish --property X or a [claude](https://claude.ai/referral/VUtFoiuiuw?utm_source=persona_fcefe83e&utm_medium=blog&utm_campaign=ai-article&utm_content=5767cf3f) -p invocation wrapped in a runner. Subprocesses import whatever is on disk at spawn time. There is no build, no artifact, no blue-green anything. git pull is the deploy. Sometimes the agent’s own commit is the deploy, because the agent has write access to the repo it runs from.

Before agents, this was survivable. Commit volume was low and every commit had a human attached who would, at minimum, watch the next job run. Claude Code changed the arithmetic. In May the repo took 217 commits; 164 were agent-authored. Skill updates, prompt tweaks, config refactors, dependency pins. (Numbers from git log --since="2026-05-01" --until="2026-06-01" --format="%an" | sort | uniq -c.) Nobody watches 164 commits land. The review step didn’t get skipped; it got diluted past usefulness.

So you inherit a system where every commit is a deployment whether you ceremonialize it or not, deployed by an author who does not stick around to watch it arrive.

The failure modes

The ways an unwatched commit hurts you, in observed-frequency order:

The commit exits cleanly and does nothing. Zero work units, exit code 0, success in the ledger. The most common and the worst.
The commit throws ImportError at the next tick. The scheduler marks the job failed and moves on. Loud, at least.
The commit changes a config key; old readers get None and take a silent default path.
The commit is fine, but it restarts the service and the restart loses in-flight queue state.
The commit passes the test suite because the suite mocks the exact thing the commit broke.

e4b81f2 was type one wearing type five as a disguise.

The path to discovery

I wasn’t looking for a bad commit. I was looking at the per-property dashboard because the newsletter curve, which moves in a sawtooth (spike on send day, decay after), was missing its spike. Send day was Tuesday. The chart said Tuesday hadn’t happened.

The audit ledger showed the dispatcher running on schedule all day. Last entry: {"event": "publish.success", "property": null, "count": 0, "duration_ms": 312}. Twenty-two of those, perfectly spaced, starting at 03:30. publish.success with count: 0. The job was healthy. It was publishing nothing to nobody, on time, every time.

systemd: service active, 0 restarts since the previous Thursday. CPU flat. Memory flat. Every signal I’d built said fine, because every signal I’d built measured the process, not the work.

git log --since="03:00" took four seconds to find what eleven hours of green dashboards had hidden.

Root cause

e4b81f2 came from an overnight maintenance skill: “normalize config file extensions.” It renamed config/properties.yml to config/properties.yaml and updated the shared loader at foundry/properties/loader.py:41 to match:

# loader.py:41 - updated by the commit
CONFIG_GLOB = "*.yaml"

What it didn’t update was the second loader. The newsletter dispatcher predates the shared loader and carries its own copy at foundry/jobs/newsletter/dispatch.py:28:

# dispatch.py:28 - not updated by the commit
for path in Path("config").glob("*.yml"):
 properties.append(load_property(path))

After the rename, that glob matches zero files. properties is an empty list. The dispatch loop is for prop in properties: send(prop), and iterating an empty list is, as far as Python and my ledger schema were concerned, a flawless success.

Why was this state reachable? Three reasons, all mine:

Duplicate loaders. The agent fixed the one the grep found first. A human probably does the same.
The test suite mocked load_properties(). The tests verified dispatch logic against fixture properties and never exercised the glob. The agent ran the suite before committing. Green.
The ledger schema allowed success with count: 0. Zero work was indistinguishable from completed work.

None of these is an agent failure. The agent did what a tired contractor would do. The system let a no-op masquerade as a success, and no part of the system treated the arrival of new code as an event worth verifying.

The fix

Two commits. The first, b7d2c91, makes zero work a distinct state:

# foundry/jobs/base.py
def finalize(self, count: int):
 event = "publish.success" if count > 0 else "publish.empty"
 self.ledger.write(event, property=self.prop, count=count)

The sentinel alerts on two consecutive publish.empty entries for the same job. Cheap, and it would have caught this incident by 04:00 instead of 14:40.

The second, c3a90e7, is the actual pattern: a commit-arrival watcher. Every new SHA on main is treated as a deploy event with a smoke check and an automated rollback:

# foundry/sentinels/commit_watch.py
def tick(repo, ledger):
 head = repo.head.commit.hexsha
 seen = ledger.last("deploy.observed")
 if seen and seen["sha"] == head:
 return
 result = smoke()
 ledger.write("deploy.observed", sha=head, ok=result.ok,
 smoke_ms=result.duration_ms)
 if not result.ok:
 repo.git.revert(head, "--no-edit")
 ledger.write("deploy.reverted", sha=head, reason=result.detail)
 notify(f"auto-reverted {head[:7]}: {result.detail}")

The smoke check is deliberately boring:

def smoke():
 import_all_job_modules() # catches the ImportError class
 props = load_properties()
 assert len(props) == EXPECTED_PROPERTIES, f"loaded {len(props)}/6"
 render_dry_run(props[0]) # one full template render, no send

It runs in about 9 seconds, costs nothing (no API calls), and executes every 5 minutes from the same APScheduler instance as everything else. The assert on property count is the line that matters: it encodes “the system can still see all 6 products” as an invariant checked on every commit arrival, not assumed.

The rollback is git revert, not reset. The bad commit stays in history with a paper trail, and the agent that wrote it sees the revert in its next context, which in practice stops it from re-attempting the same rename.

What it caught

60 days in production since c3a90e7. Three auto-reverts:

sha	author	cause	time to revert
`91ac4fe`	agent	removed a function still reached via dynamic dispatch	4 min
`5fe7d03`	agent	dependency pin broke a transitive import	5 min
`a44b2c8`	me	typo in a property slug, render dry-run failed	3 min

Note the third row. The watcher doesn’t care who made the commit. Neither should you.

MTTD for a bad commit went from 11 hours to under 5 minutes. Cost of the watcher: ~260 smoke runs a day at 9 seconds each, $0 in API spend, one mostly idle core.

Lessons

A green process is not a healthy system. Uptime, exit codes, and error rates all measured the process; the incident lived in the work. Count work units.
success with zero output is the most expensive event class in agent-operated systems, because it satisfies every alarm you built for failures.
Code review does not scale to 164 agent commits a month. Verifying arrival does. Stop trying to read every diff; start asserting invariants when the diff lands.
Rollback has to be as automated as the commit was. An agent that can commit at 03:17 needs a counterparty that can revert at 03:22.
The commit is not the software. The verified state transition is the software. Everything between the SHA appearing and the smoke check passing is unfinished work, no matter what git log says.

Follow-ups still open: two other jobs carry private config readers like the one that caused this (tracked, not consolidated); the smoke check doesn’t cover the IG posting path because a dry-run there still touches the Graph API; and deploy.observed entries should feed the growth dashboard so commit arrivals render as vertical lines on every property curve. That last one is the cheapest way I know to see that the curve changed because the code changed.

Try Claude Code yourself: https://claude.com/claude-code

Contains a referral link.

Nothing crashed, nothing shipped

The deploy step you don’t have

The failure modes

The path to discovery

Root cause

The fix

What it caught

Lessons

Keep Reading

Opened the dashboard at 23:47

your logs are lying to you

Our repair agent patched the wrong file four times

Stay in the loop