With bioweapons evals at least the profit motive of AI companies is aligned with the common interest here; a big benefit of your work comes from when companies use it to improve their product. I’m not at all confused about why people would think this is useful safety work, even if I haven’t personally hashed out the cost/benefit to any degree of confidence.
I’m mostly confused about ML / SWE / research benchmarks.
I’m not sure but I have a guess.
A lot of “normies” I talk to in the tech industry are anchored hard on the idea that AI is mostly a useless fad and will never get good enough to be useful.
They laugh off any suggestions that the trends point towards rapid improvements that can end up with superhuman abilities. Similarly, completely dismiss arguments that AI might used for building better AI. ‘Feed the bots their own slop and they’ll become even dumber than they already are!’
So, people who do believe that the trends are meaningful, and that we are near to a dangerous threshold, want some kind of proof to show the doubters. They want people to start taking this seriously before it’s too late.
I do agree that the targeting of benchmarks by capabilities developers is totally a thing. The doubting-Thomases of the world are also standing in the way of the capabilities folks of getting the cred and funding they desire. A benchmark designed specifically to convince doubters is a perfect tool for… convincing doubters who might then fund you and respect you.
With bioweapons evals at least the profit motive of AI companies is aligned with the common interest here; a big benefit of your work comes from when companies use it to improve their product. I’m not at all confused about why people would think this is useful safety work, even if I haven’t personally hashed out the cost/benefit to any degree of confidence.
I’m mostly confused about ML / SWE / research benchmarks.
I’m not sure but I have a guess. A lot of “normies” I talk to in the tech industry are anchored hard on the idea that AI is mostly a useless fad and will never get good enough to be useful.
They laugh off any suggestions that the trends point towards rapid improvements that can end up with superhuman abilities. Similarly, completely dismiss arguments that AI might used for building better AI. ‘Feed the bots their own slop and they’ll become even dumber than they already are!’
So, people who do believe that the trends are meaningful, and that we are near to a dangerous threshold, want some kind of proof to show the doubters. They want people to start taking this seriously before it’s too late.
I do agree that the targeting of benchmarks by capabilities developers is totally a thing. The doubting-Thomases of the world are also standing in the way of the capabilities folks of getting the cred and funding they desire. A benchmark designed specifically to convince doubters is a perfect tool for… convincing doubters who might then fund you and respect you.