justinpombrio comments on Are language models good at making predictions?

justinpombrio 7 Nov 2023 17:07 UTC
4 points
2
Yeah, exactly. For example, if humans had a convention of rounding probabilities to the nearest 10% when writing them, then baseline GPT-4 would follow that convention and it would put a cap on the maximum calibration it could achieve. Humans are badly calibrated (right?) and baseline GPT-4 is mimicking humans, so why is it well calibrated? It doesn’t follow from its token stream being well calibrated relative to text.