johnswentworth comments on [RFC] Possible ways to expand on “Discovering Latent Knowledge in Language Models Without Supervision”.

johnswentworth 26 Jan 2023 6:09 UTC
23 points
22
If I were doing this project, the first thing I’d want to check is whether the dlk paper is actually robustly measuring truth/falsehood, rather than something else which happens to correlate with truth/falsehood for the particular data generation/representation methods used. My strong default prior, for ml papers in general, is “you are not measuring what you think you are measuring”.