Last week Anthropic stunned the AI world by announcing Claude Mythos Preview—and then refusing to release it. Princeton’s Sayash Kapoor, co-author of the newsletter AI as Normal Technology, joins Tim and Kai Williams to make sense of the moment.
Kapoor argues that Mythos’ vulnerability-finding prowess, including unearthing a 27-year-old OpenBSD bug, fits a familiar pattern: fuzzing tools triggered similar alarm decades ago but ultimately strengthened defenders more than attackers. Kapoor’s “normal technology” thesis holds that AI’s impact is shaped less by capability jumps than by downstream adoption—how industries, legal systems, and institutions absorb the technology.
The conversation turns to whether alignment or control is the more promising safety strategy. Kapoor contends that the Mythos system card’s examples of the model bypassing access controls reveal shortcomings in control mechanisms, not alignment failures, and calls for ecosystem-level hardening—formal verification, sandboxing, network security—rather than relying on any single model behaving well.
Kapoor then shares his latest research finding that AI agent reliability is improving four to ten times more slowly than average-case accuracy, and that current frontier models—including GPT-5.2—haven’t cleared even “one nine” of reliability. On Sierra’s TauBench, agents confidently book wrong flights and refund thousands of dollars in error, with Gemini 2.5 claiming 100% confidence even when it fails. If each additional nine of reliability is harder than the last, does that mean the real timeline for autonomous AI isn’t set by when models get smart enough, but by when the surrounding infrastructure catches up?
















