I do admit, it’s not obvious developing this is helping?
Holly Elmore: I can honestly see no AI Safety benefit to this at this point in time. Once, ppl believed eval results would shock lawmakers into action or give Safety credibility w/o building societal consensus, but, I repeat, THERE IS NO SCIENTIFIC RESULT THAT WILL DO THE ADVOCACY WORK FOR US.
People simply know too little about frontier AI and there is simply too little precedent for AI risks in our laws and society for scientific findings in this area to speak for themselves. They have to come with recommendations and policies and enforcement attached.
Jim Babcock: Evals aren’t just for advocacy. They’re also for experts to use for situational awareness.
So I told him it sounded like he was just feeding evals to capabilities labs and he started crying.
I’m becoming increasingly skeptical of benchmarks like this as net useful things, because I despair that we can use them for useful situational awareness. The problem is: They don’t convince policymakers. At all. We’re learning that. So there’s no if-then action plan here. There’s no way to convince people that success on this eval should cause them to react.
I think the main value add at this point is to lower bound the capabilities for when AI safety can be automated and to upper bound capabilities for AI safety cases (very broadly), but yes the governance value of evals has declined, and there’s no plausible way for evals to help with governance in short timelines.
This is more broadly downstream of how governance has become mostly worthless in short timelines, due to basically all the major powers showing e/acc tendencies towards AI (though notably without endorsing human extinction), so technical solutions to alignment are more valuable than they once were.
The only governance is what the labs choose.
In particular, we cannot assume any level of international coordination going forward, and must assume that the post-World War II order that held up international cooperation to prevent x-risks as an anomaly, not something enduring.
Re vibe shifts:
I especially appreciate Wildeford’s #1 point, that the vibes have shifted and will shift again. How many major ‘vibe shifts’ have there been in AI? Seems like at least ChatGPT, GPT-4, CAIS statement, o1 and now DeepSeek with a side of Trump, or maybe it’s the other way around. You could also consider several others.
Whereas politics has admittedly only had ‘vibe shifts’ in, let’s say, 2020, 2021 and then in 2024. So that’s only 3 of the last 5 years (how many happened in 2020-21 overall is an interesting debate). But even with only 3 that still seems like a lot, and history is accelerating rapidly. None of the three even involved AI.
It would not surprise me if the current vibe in AI is different as soon as two months from now even if essentially nothing not already announced happens, where we spent a few days on Grok 3, then OpenAI dropped the full o3 and GPT-4.5, and a lot more people both get excited and also start actually worrying about their terms of employment.
I do not expect this for the next 4 years, and conditional on short timelines (which is 5 years away here), the vibe shift will be too late to matter, IMO.
A big reason for this is I expect politics and the news to focus on the least relevant stuff by a mile, and bring any AI stuff way down until they are replaced, but at that point the AI takeoff is likely in full swing, so we are either doomed to be extinct we survive in a new order.
I think the main value add at this point is to lower bound the capabilities for when AI safety can be automated and to upper bound capabilities for AI safety cases (very broadly), but yes the governance value of evals has declined, and there’s no plausible way for evals to help with governance in short timelines.
This is more broadly downstream of how governance has become mostly worthless in short timelines, due to basically all the major powers showing e/acc tendencies towards AI (though notably without endorsing human extinction), so technical solutions to alignment are more valuable than they once were.
The only governance is what the labs choose.
In particular, we cannot assume any level of international coordination going forward, and must assume that the post-World War II order that held up international cooperation to prevent x-risks as an anomaly, not something enduring.
Re vibe shifts:
I do not expect this for the next 4 years, and conditional on short timelines (which is 5 years away here), the vibe shift will be too late to matter, IMO.
A big reason for this is I expect politics and the news to focus on the least relevant stuff by a mile, and bring any AI stuff way down until they are replaced, but at that point the AI takeoff is likely in full swing, so we are either doomed to be extinct we survive in a new order.