Why do language models hallucinate? https://openai.com/index/why-language-models-hallucinate/
There is a new paper from OpenAI. It is mostly basic stuff. I have a question though, and thought this would be a good place to ask. Have any model trainings worked with outputing subjective probabilities for their answers? And have labs started to apply the reasoning traces to the pretraining tasks as well?
One could make the models “wager”, and then reward in line with the wager. The way this is typically done is based on the Kelly criterion and uses logarithimc scoring. The logarithmic scoring rule awards you points based on the logarithm of the probability you assigned to the correct answer.
For example, for two possibilites A and B if answer A turns out to be correct, your score is: . If answer B turns out to be correct, your score is: . Usually a constant is added to make the scores positive. For example, the score could be. The key feature of this rule is that you maximize your expected score by reporting your true belief. If you truly believe there is an 80% chance that A is correct, your expected score is maximized by setting .
I recall seeing somewhere on the internet a while back of a decision theory course, where for the exam itself students were required to output their confidence in their answer, and they would be awarded points accordingly.
What about doing the following: get rid of the distinction between post-training and pre-training. Make predicting text an RL task and allow reasoning. At the end of the reasoning chain output a subjective probability and award in accordance with the logarithmic scoring rules.e
Don’t feel obligated to respond to this comment…
This is pretty funny and entertaining. And I want to make it even more fun! You don’t necessarily need to worry about tracking an infinite number of echoes. Let’s assume that you can track any echo to within λ<1 accuracy. Even if you know someone very well, you can’t read minds. So say for the sake of argument right now that λ<0.5 as an example.
Then, sweeping a bunch of stuff under the rug, a simple mathematical way to model the culture would be a power series:
f=f0+Aλ+Bλ2+Cλ3+...
Where A,B,C,... are your predictions for how your conversation partner will respond for that particular echo.A,B,C,... are not going to be real numbers, they will be some distribution/outer product of distributions, but the point is that because λ<1 this series should converge. Cultures where λ is higher will be more “nonlinear.”