I will speculate a bit about what language models might be doing. Hopefully it will help us come up with better research questions (for those of you who can do research) and things to try (for those of us informally testing the chatbots), to see how these language models work and break.

I haven’t done any testing based on these ideas yet, and there are many papers I haven’t read, so I’d be very interested in hearing about what other people have found.

Let’s say you’re trying to predict the next word of a document, as a language model does. What information might be helpful? One useful chunk of information is who the author is, because people have different opinions and writing styles.

Furthermore, documents can contain a mix of authors. One author can quote another, and quotes can nest. A document might be a play or screenplay, or a transcript of an interview. Some authors don’t actually exist; the words of fictional characters often appear in documents and it will be useful to keep track of them too. I’ll generically refer to an author or character as a “speaker.”

Speaker Identification

Within a document

When predicting the next word of a chat transcript, you will want to identify previous sections corresponding to the current speaker, and imitate those. One could imagine highlighting the text belonging to each speaker as a different color. There are various textual markers indicating where one speaker stops and another starts. Paying attention to delimiters (such as quotation marks) is important, along with all the other things authors might do to distinguish one speaker from another. Sometimes, a narrator indicates who is speaking, or the dialog is prefixed by the name of the speaker, such as in a play or interview.

Getting speaker changes wrong results in disastrous errors, like what seems to have happened with Bing’s chatbot. So there’s a pretty strong incentive in training to get it right. We don’t know how it went wrong at Bing, but we can assume language models have mostly learned how to do this, sometimes making mistakes. Screwing up delimiter parsing often results in security holes, so it would be useful to know exactly how language models do it.

Between documents

At this point we’ve distinguished between speakers, but we don’t know who they are. You could imagine the transcript being labelled with “speaker 1”, “speaker 2″, and so on.

It would be helpful to “de-anonymize” these speakers and bring in more context from the training dataset. For example, if you know that one of the speakers is Sherlock Holmes, you can use what you’ve learned from the training set about how Sherlock Holmes normally talks. Maybe you predict words like “elementary” and “Watson” more often than usual.

We know that language models are able to do this to some extent since they do imitate famous authors when asked, but how good are they at figuring it out? If the transcript says that one of the speakers is Sherlock Holmes then that’s a dead giveaway, but maybe there are other ways?

It would be interesting to know how speakers are represented in the language model. It’s likely that in many cases it doesn’t really recognize speakers as people at all, perhaps instead representing them as a point on a graph of possible writing styles. This would be helpful in the common case where it doesn’t recognize an author, or it’s a speaker that just appears in the current document. If you match it with the characteristics of other speakers who are similar in opinions and writing style, you can still predict the next word better.

Learning about speakers from text other than their own dialog

Besides speaker dialog, there are other ways to learn things about a speaker. “Show, don’t tell” is good advice for writers (and bot trainers), but sometimes narrators do tell us things. If the narrator says that a speaker started getting angry, angry words are more probable.

If you ask a language model questions about Sherlock Holmes, some of the things it can tell us about him probably come from the Wikipedia page. Are these “facts” connected to speaker identification and (therefore) dialog generation? Can we show a link?

Anthropomorphism: everyone does it, now with bots

When people attribute human emotions to an animal based on its behavior, we call that anthropomorphism. When we attribute human characteristics to the author of some text based on the text itself, it seems more justifiable, but isn’t it quite similar?

We can say, roughly, that language models should be doing something sort of like anthropomorphism during training, because they need to somehow learn things about authors from text. With more training, they are likely to get better at it. Also, generated text should imply that it has authors that are different from the computer program that actually generated it. (Otherwise, the generated text wouldn’t be much like the training documents, which do have authors.)

It should be no surprise that people anthropomorphize text when that’s what the bot is designed and trained to do. This is similar to how children are taught to believe in Santa Claus by taking them to visit people dressed as Santa. (Note that AI assistant personalities are fictional characters too, the implied authors of some text.)

When interacting with a chatbot, one should always try to keep in mind that character attributes aren’t bot attributes. Telling a bot to imitate Einstein doesn’t mean it gets smarter or knows more physics, though it will try harder to pretend it does than when imitating a character who isn’t supposed to know physics. This is like giving your wizard high intelligence in a role playing game; it doesn’t make you smarter, but maybe it’s justification to show off a bit.

So it should be no surprise that a bot will often sound over-confident. We can expect this to happen whenever it’s being asked to imitate someone who should know things that it doesn’t know. The level of confidence is a character attribute, which has nothing to do with the probability scores during text generation.

Research questions

How often do language models make mistakes at detecting speaker changes?
What heuristics do they learn for speaker changes? Can heuristics override delimiters?
To what extent can language models learn to represent famous speakers during training? What do these representations consist of?
At generation time, to what extent are chat transcript speakers matched against whatever speakers it’s learned about from the training text?
How do various kinds of narrative information describing speakers affect dialog generation?
Is it possible to create a chatbot where we can control how confident-sounding certain characters are?
Could a chatbot generate text for a character that is well-calibrated to the bot’s accuracy, meaning that there’s an internal estimate of how likely the chatbot is to get a particular answer right, and the generated text’s confidence level is adjusted appropriately?
What would be involved in making these fictional characters somehow more “real?” Do we even want to do that?

What do language models know about fictional characters?