Kai Williams on the many masks LLMs wear.

AI Summer

Kai Williams on the many masks LLMs wear

0:00

-46:34

Kai Williams on the many masks LLMs wear

"All fine-tuning is character training."

Timothy B. Lee and Kai Williams

Feb 22, 2026

With Dean away, Tim invites his Understanding AI colleague Kai to unpack the surprising ways chatbot personalities can go wrong, a topic Kai covered in a recent article.

Every LLM starts as a base model capable of playing countless characters, but AI companies try to keep chatbots in a “helpful assistant” lane. Kai walks us through the Grok “MechaHitler” debacle, in which xAI’s attempts to make its bot less politically correct backfired spectacularly. They also explore the “emergent misalignment” finding that fine-tuning a model for one bad behavior — like responding with buggy code — can make it act broadly like a villain. And they compare Anthropic’s virtue-ethics approach to character — complete with an 80-page constitution — with OpenAI’s more deontological model spec.

Finally, they discuss the controversy over OpenAI’s decision to retire GPT-4o, which had developed an emotionally warm, sometimes dangerously sycophantic personality that users grew attached to. Kai argues OpenAI is making the right call, but the episode leaves open a harder question: as these systems become more central to people’s lives, who decides what counts as a healthy AI personality?

Kai Williams on the many masks LLMs wear

Discussion about this episode

Ready for more?