Teaching AI Agents to Write Tests That Don't Suck via Canon TDD

AI coding agents produce notoriously bad tests — vague, tautological, or performative — largely because they learned from a corpus of human-written tests that suffer the same flaws. The author argues this won’t self-correct anytime soon, but agents can be coached into writing meaningful tests when given the right discipline. Pointing an agent at Kent Beck’s Canon TDD alone gets you most of the way there.

The author’s personal skill wraps Canon TDD in a higher-level loop called specify-encode-fulfill: define the specification, encode it as an executable test, then write only the minimum code to pass. Speculative coding is forbidden, and behavior changes must be committed before any refactor. A separate Test Design Review skill spawns a fresh agent to critique the tests for design violations like testing means rather than ends, sidestepping the original agent’s bias.

An unexpected win: an offhand instruction to “clean the kitchen before making dinner” when a test gets hard to write has become one of Claude’s most useful habits, prompting it to flag refactoring opportunities surfaced by test friction. The broader takeaway is that the biggest productivity gains come from pairing AI with old, durable engineering principles rather than expecting the model to invent discipline on its own.