FWIW, my interpretation of what we should be learning is pretty different here.
I would broadly say that political will for anything in the “slow down AI as needed to make it safe” category has been well short of what many people (such as myself) hoped for. Because of this, some of the core founding hopes of the RSP project look untenable now (although I don’t consider the matter totally settled); but to me it feels like an even bigger update away from “pause now” movements.
I don’t understand why you say this: “the appetite for conditional risk regulation has been substantially less than the appetite for direct risk regulation (or compute thresholding, which doesn’t require any complicated eval infrastructure).” I have not seen roughly any appetite for “compute thresholding” if that means something like “limiting the size of training runs” (I have seen “compute thresholding” in the sense of “reporting requirements triggered by compute thresholds”). I don’t know what you mean by “direct risk regulation”, but if it means regulation aimed at slowing down AI immediately, I also have seen much less (roughly no) appetite/momentum for that, and more for regulation based around things like evals and if-then commitments.
Separately, with the benefit of hindsight, I think a global AI pause in 2023 would have been bad on the merits compared to, say, a pause around when the original RSP implied a pause should happen. The former, compared to the latter, would have meant losing a lot of opportunities for meaningful alignment research and more broadly for the world to learn important things relevant to AI safety, while having almost no marginal catastrophic risk reduction benefit AFAICT.
You may have a view like “2023 was the right time to pause, because it was politically tractable then, but postponing it ensured it would not remain politically tractable.” That would be a very different read from mine on the political situation.
> This is a huge deal! This was, as far as I can tell, the single decision that most affected talent allocation in the whole AI safety community. METR and Apollo and the broader “evals” agenda became the most popular and highest-prestige thing to work on for people in AI safety.
This seems off to me. First, the emphasis on evals predated the idea of if-then commitments, and I think attracted more resources at pretty much every point in time; evals have a variety of potential benefits that don’t rely on if-then commitments. Second, I don’t think most people who work on AI safety work on either of these.
You’re welcome to this view, but it isn’t mine. Other major projects I’ve worked on involved major, fundamental pivots that resulted in projects almost unrecognizable from the original pitch, and if you’d asked me at the time I would’ve said I expected something similar for RSPs. On priors, a completely new kind of risk management framework should be expected to change dramatically. I never would have agreed to “The policies we’re coming up with today will only change in the details.”
I’ve seen many claims along these lines, and my guess is that there is at least some truth to them, but I am somewhat surprised by how little I’ve seen in the way of tangible specifics re: who promised what and who made important plans and decisions based on this. I have mostly just seen the quote from Evan, which Evan claims is a mischaracterization. I genuinely can’t recall encountering this sort of thing firsthand (though I’ve only been at Anthropic for about a year). Again, I’m not saying such things didn’t happen, but I haven’t seen enough specifics to be affirmatively convinced by claims like this or to have a clear sense of who was responsible, who was harmed, etc.
I don’t think that feeling unhealthy pressures implies that there is a governance failure. For example, people regularly feel pressure to avoid admitting they were wrong—I don’t think this is a particular person’s fault or calls a particular governing structure into question. My statement here was about psychological pressures, not pressures imposed by e.g. executives.
I think the situation today is worse on the dimensions being discussed here (e.g., political will), although better on some other dimensions (IMO, the technical picture looks at least somewhat better).
I don’t accept the framing that stricter commitments are necessarily less “reckless” while more flexible frameworks are more “reckless.” I think the new policy better positions us to reduce risk worldwide. If pausing were the only or clearly best path to risk reduction, I think it would make more sense to associate greater flexibility with greater recklessness. Perhaps you believe that’s the case; I don’t.
I don’t believe a race is inevitable, and I think there may be things Anthropic can do to make a race less likely. But I think that those things are and will continue to be highly contingent on specific circumstances, and I also believe there are non-slowdown-oriented actions that can reduce risk significantly (and potentially more than slowdown-oriented actions, depending on the circumstances).