While I definitely agree there are negative externalities here, I also think there are extremely positive externalities from key decision makers being better informed, especially knowing how close we are to certain capabilities like AI enabled bioterrorism or cybercrime, or automated R&D/an intelligence explosion, etc. Information is great, and generally I think has a fairly positive effect even if the decision maker is not highly competent. Bioterrorism and cybercrime at least are not things I’m concerned about AGI researchers hill climbing on, automated R&D is much dicier
I’m surprised to hear that you aren’t concerned about negative benchmarks being hillclimb targets for anyone. This updates me somewhat, though the hypotheses I’m still worried about are ones where the dishonest labs, whichever those turn out to be, are the main source of optimizing-for-bad-behavior-benchmarks. I also expect that bio/chem tasks that aren’t malicious-use-specific, which is the topic at hand, will get optimized for by less-dishonest labs, in at least some cases.
While I definitely agree there are negative externalities here, I also think there are extremely positive externalities from key decision makers being better informed, especially knowing how close we are to certain capabilities like AI enabled bioterrorism or cybercrime, or automated R&D/an intelligence explosion, etc. Information is great, and generally I think has a fairly positive effect even if the decision maker is not highly competent. Bioterrorism and cybercrime at least are not things I’m concerned about AGI researchers hill climbing on, automated R&D is much dicier
Private benchmarks seem solid here too
I’m surprised to hear that you aren’t concerned about negative benchmarks being hillclimb targets for anyone. This updates me somewhat, though the hypotheses I’m still worried about are ones where the dishonest labs, whichever those turn out to be, are the main source of optimizing-for-bad-behavior-benchmarks. I also expect that bio/chem tasks that aren’t malicious-use-specific, which is the topic at hand, will get optimized for by less-dishonest labs, in at least some cases.
Yeah, I feel much better about malicious use specific ones. Agreed that HLE is more generic and this is much worse