Bogdan Ionut Cirstea comments on How “Discovering Latent Knowledge in Language Models Without Supervision” Fits Into a Broader Alignment Scheme

Bogdan Ionut Cirstea 26 Jan 2023 19:11 UTC
3 points
1
It might be useful to have a look at Language models show human-like content effects on reasoning, they empirically test for human-like incoherences / biases in LMs performing some logical reasoning tasks (twitter summary thread; video presentation)