Another big class of implied personas is the authors of all the text that they’ve encountered.
That also includes the ‘authors’ of texts that don’t actually have an author per se. Some cases I can imagine are
Text from chat platforms where the usernames have been dropped and so it looks like text from one person but is really from many
The ‘authors’ of fictional books that don’t exist
Machine translations which would likely imply authors who are different from the authors of the original translated text
Etc etc.
In some ways this seems like the most central case, since correctly modeling authors is at the heart of the pre-training loss function.
Modeling authors could be seen as a separate category from personas, but I expect that under the hood it’s mostly the same thing using the same mechanisms; a persona is just a particular kind of author (or perhaps vice versa).
Another big class of implied personas is the authors of all the text that they’ve encountered.
That also includes the ‘authors’ of texts that don’t actually have an author per se. Some cases I can imagine are
Text from chat platforms where the usernames have been dropped and so it looks like text from one person but is really from many
The ‘authors’ of fictional books that don’t exist
Machine translations which would likely imply authors who are different from the authors of the original translated text
Etc etc.
In some ways this seems like the most central case, since correctly modeling authors is at the heart of the pre-training loss function.
Modeling authors could be seen as a separate category from personas, but I expect that under the hood it’s mostly the same thing using the same mechanisms; a persona is just a particular kind of author (or perhaps vice versa).