Different people went different paths, but most of the ecosystem’s resources went into the latter kind of plan, with the two central pillars being to start and invest in evaluation companies like METR and Apollo (to develop evaluations and capability measurements that could provide the ifs), and work at companies within governments to develop commitments (the thens) based on the evaluations.
I think this sort of overstates the proportion of effort that went into that kind of work. There was also a lot of work that aimed to develop techniques that reduce or improve understanding of misalignment risk (e.g. Redwood’s stuff).
METR and Apollo and the broader “evals” agenda became the most popular and highest-prestige thing to work on for people in AI safety.
IMO both METR and Apollo substantially pivoted away from the strategy you’re describing here at least a year ago.
I think this sort of overstates the proportion of effort that went into that kind of work. There was also a lot of work that aimed to develop techniques that reduce or improve understanding of misalignment risk (e.g. Redwood’s stuff).
I think “most” is roughly accurate (like IDK, my sense is around 60% of talent + funding was reallocated to plans of that kind). I agree that other people kept doing different things!
I do think there aren’t that many places that do work around reducing or understanding misalignment risk, especially outside of the labs (which I am excluding here).
IMO both METR and Apollo substantially pivoted away from the strategy you’re describing here at least a year ago.
I am honestly confused what METR’s current theory of impact is.
It seems most effort is going into things like the time horizon evaluations, but it’s not super clear how this translates into the world getting better (though I am generally of the school that helping people understand what is going on will make things better, even if you can’t specify exactly how, so I do think it’s good).
I have been appreciative of METR staff being more public and calling directly for regulations/awareness of the risks, but things still haven’t come together for me in a coherent way, but in as much as METR “pivoted”, I am not quite sure what it has pivoted to.
I think this sort of overstates the proportion of effort that went into that kind of work. There was also a lot of work that aimed to develop techniques that reduce or improve understanding of misalignment risk (e.g. Redwood’s stuff).
IMO both METR and Apollo substantially pivoted away from the strategy you’re describing here at least a year ago.
I think “most” is roughly accurate (like IDK, my sense is around 60% of talent + funding was reallocated to plans of that kind). I agree that other people kept doing different things!
I do think there aren’t that many places that do work around reducing or understanding misalignment risk, especially outside of the labs (which I am excluding here).
I am honestly confused what METR’s current theory of impact is.
It seems most effort is going into things like the time horizon evaluations, but it’s not super clear how this translates into the world getting better (though I am generally of the school that helping people understand what is going on will make things better, even if you can’t specify exactly how, so I do think it’s good).
I have been appreciative of METR staff being more public and calling directly for regulations/awareness of the risks, but things still haven’t come together for me in a coherent way, but in as much as METR “pivoted”, I am not quite sure what it has pivoted to.