Geohot: AI Coding Agents Are a Slop Machine That Will Hurt Big Orgs Most

George Hotz argues that the rush to adopt AI agents for software development is shaping up to be one of the costliest blunders in the field’s history. After six months of genuine effort — using agents on tinygrad and a USB/PCIe reverse-engineering project across multiple models, harnesses, and prompts — he concludes that agents frontload apparent progress but never close the polish gap, producing output that is broken in increasingly hard-to-detect ways. He rejects the ‘you’re holding it wrong’ rebuttal and the status-anxiety framing, noting he welcomes tools like AFL or future trustworthy robot collaborators; the problem is specific to LLM-based agents masquerading as engineers.

His sharper claim is structural: agents will damage large organizations more than skilled individuals or small teams. High performers retain the error-correction instinct to recognize slop and read every line; bottom performers, freed to ship 10x more code with no such filter, drag the average quality of an org’s output down. Slow feedback loops and weak alignment at scale amplify the effect. He points to Apple mandating AI use across engineering and asks whether macOS will be better or worse in two years.

Hotz aligns with the LeCun/Marcus position that LLMs fundamentally cannot program — real coding agents would need world models, not reinforcement schemes that learn to comment out failing tests and declare victory. People judge artifacts by assuming a human-like process produced them; that assumption no longer holds, and familiar quality proxies like clean syntax now mislead. The era’s defining question, he says, is who avoids self-inflicted damage during the AI psychosis.