(I don’t think Dan Hendrycks deliberately released a benchmark with errors, but it would be hilarious, especially given his gripes about GPQA label noise.)
I also very much don’t think errors were on purpose. Abstaining from capability measurement doesn’t seem likely to do much else besides save ones’ own time to do something more useful; I just hope people who are excited to do alignment work don’t end up wasting their time on it. Errors seem likely to end up not particularly slowing things down—someone else is going to make a benchmark, but let a capability person do it so you can spend your time doing something more useful, like figuring out how to algorithmically generate arbitrary amounts of guaranteed-aligned training data from interacting with humans that has some sort of probabilistic guarantee of representing behavior that is in fact good rather than simply easy to generate. (Edit: which is probably done through a somewhat complex theoretical breakthrough about how to incentivize aligned-by-construction exploration or some such thing)
...🤔 …well played, Dan, well played.
(I don’t think Dan Hendrycks deliberately released a benchmark with errors, but it would be hilarious, especially given his gripes about GPQA label noise.)
I also very much don’t think errors were on purpose. Abstaining from capability measurement doesn’t seem likely to do much else besides save ones’ own time to do something more useful; I just hope people who are excited to do alignment work don’t end up wasting their time on it. Errors seem likely to end up not particularly slowing things down—someone else is going to make a benchmark, but let a capability person do it so you can spend your time doing something more useful, like figuring out how to algorithmically generate arbitrary amounts of guaranteed-aligned training data from interacting with humans that has some sort of probabilistic guarantee of representing behavior that is in fact good rather than simply easy to generate. (Edit: which is probably done through a somewhat complex theoretical breakthrough about how to incentivize aligned-by-construction exploration or some such thing)