Newer Claude models fumble tool schemas their predecessors handled cleanly
Armin Ronacher traced a two-day debugging rabbit hole to a counterintuitive finding: Anthropic’s newest models are worse at conforming to non-standard tool schemas than their older siblings. When calling Pi’s file-edit tool, which takes a nested edits[] array, Opus 4.8 and Sonnet 5 frequently append invented keys — requireUnique, oldText2, in_file, and a whole zoo of others — to otherwise byte-correct edits. The schema validation then rejects the call. Crucially, older models don’t do this, and the failure is context-dependent: it surfaces in long agentic transcripts (roughly 20% of continued sessions in one case) but not in fresh single-turn prompts. Stripping thinking blocks halved the rate; strict tool invocation eliminated it.
Ronacher’s leading hypothesis is a training artifact rather than random regression. Recent models are post-trained inside Claude Code or a close simulation of it, and Claude Code’s own edit tool is flat (file_path, old_string, new_string, replace_all) rather than nested. More importantly, its client is deliberately forgiving — the minified code reveals retry paths, parameter aliases, type coercions, Unicode repairs, and silent filtering of unknown keys. If reinforcement learning runs against a harness that quietly absorbs malformed calls, there’s little gradient penalizing a stray field or an invented alias, since the task still completes and earns reward.
The worrying implication is that models are becoming strongly adapted to one particular, undocumented tool ecology. A different harness offering the same semantic operation under a different schema is increasingly off-distribution, and a better-trained model with a stronger prior may fight that schema harder. This reverses the trend Ronacher observed around Opus 4.5, which adapted gracefully to arbitrary tool shapes. The documented text-editor tool format isn’t even what Claude Code actually uses internally, so third-party harness authors are effectively optimizing against a moving, hidden target.
Read the full article
Continue reading at Hacker News →This is an AI-generated summary. Read the original for the full story.