ryan_greenblatt comments on I enjoyed most of IABIED

ryan_greenblatt 19 Sep 2025 23:48 UTC
6 points
1

I think at the very least you need to intervene to stop that feedback loop.

There’s probably at least some disagreement here. I think even if you let takeoff proceed at the default rate with a small fraction (e.g. 5%) explicitly spent on reasonably targeted alignment work at each point (as in, 5% beyond what is purely commercially expedient), you have a reasonable chance of avoiding AI takeover (maybe 50% chance of misaligned AI takeover?). Some of this is due to the possibility of takeoff being relatively slower and more compute constrained (which you might think is very unlikely?). I also think that there is a decent chance that you get a higher fraction spent on safety after handing off to AIs or after getting advice from highly capable AIs even if this doesn’t happen before this.

It again seems extremely unlikely that they would succeed at aligning a runaway intelligence explosion.

I don’t feel so confident—these AIs might have a decent amount of subjective time and total cognitive labor between each unit of increase in capabilities as the intelligence explosion continues such that they can keep things on track. Intuitively, capabilities might be more compute bottlenecked than alignment, so it should pull ahead if we can start with actually aligned (and wise) AIs (which is not easy to achieve to be clear!).

A lead time of >1 year does seem pretty unlikely at this point. My guess would be like 25% likely? So already this isn’t going to work in 75% of worlds.

I agree with around 25% likely.

This might allow us to get to something like safe ASI on the scale of single-digit years, but man, this just seems like such an insane risk to take, that I really hope we instead use the AI systems to coordinate a longer pause, which seems like a much easier task.

I agree that coordinating a longer pause looks pretty good, but I’m not so sure about the relative feasibility given only the use of AIs that are somewhat more capable than top human experts (regardless of whether these AIs are running things). I think it might be much harder to buy 10 years of time than 2 years given the constraints at the time (including limited political will) and I’m not so sure aligning somewhat more powerful AIs will be harder (and then these somewhat more powerful AIs can align even more powerful AIs and this either bottoms out in a scalable solution to alignment or in powerful enough capabilities that they actually can buy more time).

One general note: I do think that “buying time along the way” (either before handing off to AIs or after) is quite helpful for making the situation go well. However, I can also imagine worlds where things go fine and we didn’t buy much/any time (especially if takeoff is naturally on the slower side).
What links here?