nostalgebraist comments on larger language models may disappoint you [or, an eternally unfinished draft]

nostalgebraist 9 Dec 2021 1:24 UTC
LW: 9 AF: 5
AF
It will still only provide a lower bound, yes, but only in the trivial sense that presence is easier to demonstrate than absence.
All experiments that try to assess a capability suffer from this type of directional error, even prototype cases like “giving someone a free-response math test.”
- They know the material, yet they fail the test: easy to imagine (say, if they are preoccupied by some unexpected life event)
- They don’t know the material, yet they ace the test: requires an astronomically unlikely coincidence
The distinction I’m meaning to draw is not that there is no directional error, but that the RL/SL tasks have the right structure: there is an optimization procedure which is “leaving money on the table” if the capability is present yet ends up unused.