Daniel Kokotajlo comments on The persona selection model

Daniel Kokotajlo 25 Feb 2026 6:06 UTC
LW: 5 AF: 4
0
AF
Importantly, though, the LLM is not itself a persona, so it is not constrained to have human-like goals or psychology.
The personas don’t have to be human-like either; a paperclip maximizer AI is a common enough sci-fi trope that it’s probably learned in pretraining and becomes an available persona. I think you realize this but maybe just spoke loosely here and accidentally implied that personas are constrained to have human-like goals/psychology.
- RogerDearnaley 27 Feb 2026 15:51 UTC
  2 points
  2
  Parent
  The Tooth Fairy is also a persona, if not a very complex one. Also, I’m sure there are tokens on the internet generated by fully-automated processes, like automated weather stations. Those will have personas too. Personas are just part of the world model: they’re an interesting part for alignment because they’re the things in the world model that are agentic and have goals, so present alignment problems. Automated weather-station-like personas are probably rather far down the priority list for alignment problems (but some personas learned from simple bots on social media might not be).