Importantly, though, the LLM is not itself a persona, so it is not constrained to have human-like goals or psychology.
The personas don’t have to be human-like either; a paperclip maximizer AI is a common enough sci-fi trope that it’s probably learned in pretraining and becomes an available persona. I think you realize this but maybe just spoke loosely here and accidentally implied that personas are constrained to have human-like goals/psychology.
The Tooth Fairy is also a persona, if not a very complex one. Also, I’m sure there are tokens on the internet generated by fully-automated processes, like automated weather stations. Those will have personas too. Personas are just part of the world model: they’re an interesting part for alignment because they’re the things in the world model that are agentic and have goals, so present alignment problems. Automated weather-station-like personas are probably rather far down the priority list for alignment problems (but some personas learned from simple bots on social media might not be).
The personas don’t have to be human-like either; a paperclip maximizer AI is a common enough sci-fi trope that it’s probably learned in pretraining and becomes an available persona. I think you realize this but maybe just spoke loosely here and accidentally implied that personas are constrained to have human-like goals/psychology.
The Tooth Fairy is also a persona, if not a very complex one. Also, I’m sure there are tokens on the internet generated by fully-automated processes, like automated weather stations. Those will have personas too. Personas are just part of the world model: they’re an interesting part for alignment because they’re the things in the world model that are agentic and have goals, so present alignment problems. Automated weather-station-like personas are probably rather far down the priority list for alignment problems (but some personas learned from simple bots on social media might not be).