One plausible claim is that GPT-4.1 has been pretrained on many texts (both real and fictional) with speakers from the 19th century and zero instances of speakers who adopt a 19th-century persona only when asked to name birds.
This seems almost certainly true to me. It would be interesting to see how training on OLD BIRD NAMES would change with a dataset of contemporary-style responses (e.g. GPT-4.1 responses to some diverse prompt dataset) mixed in. If it is the case that “a 19th-century persona only when asked to name birds” is more complex for the model to represent (given what it was pretrained on), it might need to see more OLD BIRD NAMES tokens to get to the same level of “old bird name usage capability” as models finetuned on just OLD BIRD NAMES.
This was a fascinating investigation!
This seems almost certainly true to me. It would be interesting to see how training on OLD BIRD NAMES would change with a dataset of contemporary-style responses (e.g. GPT-4.1 responses to some diverse prompt dataset) mixed in. If it is the case that “a 19th-century persona only when asked to name birds” is more complex for the model to represent (given what it was pretrained on), it might need to see more OLD BIRD NAMES tokens to get to the same level of “old bird name usage capability” as models finetuned on just OLD BIRD NAMES.