I think this post overstates its case, but it makes important under-discussed points!
Here’s another argument against evals that resonates with me:
One big reason why people do evals is that they see evals as completely cooperative and non-adversarial, whereas regulation and advocacy could cause conflict and polarization. If all you’re doing is explaining facts, no one should regret interacting with you.
I think that’s not quite right in the case of AI wakeup and AI regulation. It’s obvious that in retrospect, all ambitious or powerful people will wish they had made AGI their top priority some time in the now-past, such as 2023. So if, in your honest and cooperative communication, you cause people to be less afraid and more complacent about AI, they will likely regret that interaction in retrospect! At some point later, they will realize they were grossly underrating AI, lacked sufficient context to effectively reason about AI, and will probably wish that someone had pushed them harder to prioritize AI safety. In that sense, it’s actually more coooperative to selectively share information that increases wakeup and willingness to take costly safety actions, not unbiased information. Of course this consequentialist reasoning is not a solid basis for decision making, but it does make me believe that telling someone who isn’t intimately familar with AI “actually Y model can’t cause catastrophe Z yet” is not really a mutually beneficial interaction.
I think this post overstates its case, but it makes important under-discussed points!
Here’s another argument against evals that resonates with me:
One big reason why people do evals is that they see evals as completely cooperative and non-adversarial, whereas regulation and advocacy could cause conflict and polarization. If all you’re doing is explaining facts, no one should regret interacting with you.
I think that’s not quite right in the case of AI wakeup and AI regulation. It’s obvious that in retrospect, all ambitious or powerful people will wish they had made AGI their top priority some time in the now-past, such as 2023. So if, in your honest and cooperative communication, you cause people to be less afraid and more complacent about AI, they will likely regret that interaction in retrospect! At some point later, they will realize they were grossly underrating AI, lacked sufficient context to effectively reason about AI, and will probably wish that someone had pushed them harder to prioritize AI safety. In that sense, it’s actually more coooperative to selectively share information that increases wakeup and willingness to take costly safety actions, not unbiased information. Of course this consequentialist reasoning is not a solid basis for decision making, but it does make me believe that telling someone who isn’t intimately familar with AI “actually Y model can’t cause catastrophe Z yet” is not really a mutually beneficial interaction.
Unless the eval results require action that AI developers won’t like