It will still only provide a lower bound, yes, but only in the trivial sense that presence is easier to demonstrate than absence.
All experiments that try to assess a capability suffer from this type of directional error, even prototype cases like “giving someone a free-response math test.”
They know the material, yet they fail the test: easy to imagine (say, if they are preoccupied by some unexpected life event)
They don’t know the material, yet they ace the test: requires an astronomically unlikely coincidence
The distinction I’m meaning to draw is not that there is no directional error, but that the RL/SL tasks have the right structure: there is an optimization procedure which is “leaving money on the table” if the capability is present yet ends up unused.
It will still only provide a lower bound, yes, but only in the trivial sense that presence is easier to demonstrate than absence.
All experiments that try to assess a capability suffer from this type of directional error, even prototype cases like “giving someone a free-response math test.”
They know the material, yet they fail the test: easy to imagine (say, if they are preoccupied by some unexpected life event)
They don’t know the material, yet they ace the test: requires an astronomically unlikely coincidence
The distinction I’m meaning to draw is not that there is no directional error, but that the RL/SL tasks have the right structure: there is an optimization procedure which is “leaving money on the table” if the capability is present yet ends up unused.