gwern comments on Are language models good at making predictions?

gwern 11 Nov 2023 3:15 UTC
3 points
0
They don’t cite the de-calibration result from the GPT-4 paper, but the distribution of GPT-4′s ratings here looks like it’s been tuned to be mealy-mouthed: humped at 60%, so it agrees with whatever you say but then can’t even do so enthusiastically https://arxiv.org/pdf/2310.13014.pdf#page=6 .