Yeah! That’s related to what Beth says in a later paragraph:
I think this happens in part because the model has seen documents with missing text, where things were e.g. in an embedded image, or stripped out by the data processing, or whatever.
And I think it’s a reasonable task for the model to do. I also think what you said is an uncontroversial, relatively standard explanation for why the model exhibits this behavior.
In modern LM parlance, “hallucination” doesn’t needs to be something humans get right, nor something that is unreasonable for the AI to get wrong. The specific reason this is considered a hallucination is because people often want to use LMs for text-based question answering or summarization, and making up content is pretty undesirable for that kind of task.
I don’t think there’s an agreed upon definition of hallucination, but if I had to come up with one, it’s “making inferences that aren’t supported by the prompt, when the prompt doesn’t ask for it”.
The reason why the boundary around “hallucination” is fuzzy is because language models constantly have to make inferences that aren’t “in the text” from a human perspective, a bunch of which are desirable. E.g. the language model should know facts about the world, or be able to tell realistic stories when prompted.
Yeah! That’s related to what Beth says in a later paragraph:
And I think it’s a reasonable task for the model to do. I also think what you said is an uncontroversial, relatively standard explanation for why the model exhibits this behavior.
In modern LM parlance, “hallucination” doesn’t needs to be something humans get right, nor something that is unreasonable for the AI to get wrong. The specific reason this is considered a hallucination is because people often want to use LMs for text-based question answering or summarization, and making up content is pretty undesirable for that kind of task.
Thanks for clarifying!
So, in that case:
What exactly is a hallucination?
Are hallucinations sometimes desirable?
I don’t think there’s an agreed upon definition of hallucination, but if I had to come up with one, it’s “making inferences that aren’t supported by the prompt, when the prompt doesn’t ask for it”.
The reason why the boundary around “hallucination” is fuzzy is because language models constantly have to make inferences that aren’t “in the text” from a human perspective, a bunch of which are desirable. E.g. the language model should know facts about the world, or be able to tell realistic stories when prompted.