I think the catastrophe through chaos story is the most likely outcome, conditional on catastrophe happening.
The big disagreement might ultimately be about timelines, as I’ve updated towards longer timelines, such that world-shakingly powerful AI is probably in the 2030s or 2040s, not this decade, though I put about 35-40% credence in the timeline in the post being correct, though I put more credence in at least 1 new paradigm shift before world-shaking AI happens.
The other one is probably that I’m more optimistic in turning aligned TAI into aligned ASI, because I am reasonably confident both in the alignment problem is easy overall, combined with being much more optimistic on automating alignment compared to a lot of other people.
what made you update towards longer timelines? My understanding was that most people updated toward shorter timelines based on o3 and reasoning models more broadly.
Technical note, I’m focusing on existential catastrophes, not normal catastrophes, and the difference is that no humans have power anymore, compared to only a few humans having power, so this mostly excludes scenarios like these:
I think the catastrophe through chaos story is the most likely outcome, conditional on catastrophe happening.
The big disagreement might ultimately be about timelines, as I’ve updated towards longer timelines, such that world-shakingly powerful AI is probably in the 2030s or 2040s, not this decade, though I put about 35-40% credence in the timeline in the post being correct, though I put more credence in at least 1 new paradigm shift before world-shaking AI happens.
The other one is probably that I’m more optimistic in turning aligned TAI into aligned ASI, because I am reasonably confident both in the alignment problem is easy overall, combined with being much more optimistic on automating alignment compared to a lot of other people.
what made you update towards longer timelines? My understanding was that most people updated toward shorter timelines based on o3 and reasoning models more broadly.
A big one has to do with Deepseek’s R1 maybe breaking moats, essentially killing industry profit if it happens:
https://www.lesswrong.com/posts/ynsjJWTAMhTogLHm6/?commentId=a2y2dta4x38LqKLDX
The other issue has to do with o1/o3 being potentially more supervised than advertised:
https://www.lesswrong.com/posts/HiTjDZyWdLEGCDzqu/?commentId=gfEFSWENkmqjzim3n#gfEFSWENkmqjzim3n
Finally, Vladimir Nesov has an interesting comment on how Stargate is actually evidence for longer timelines:
https://www.lesswrong.com/posts/fdCaCDfstHxyPmB9h/vladimir_nesov-s-shortform#W5twe6SPqe5Y7oGQf
Technical note, I’m focusing on existential catastrophes, not normal catastrophes, and the difference is that no humans have power anymore, compared to only a few humans having power, so this mostly excludes scenarios like these:
https://www.lesswrong.com/posts/pZhEQieM9otKXhxmd/gradual-disempowerment-systemic-existential-risks-from
https://www.lesswrong.com/posts/2ujT9renJwdrcBqcE/the-benevolence-of-the-butcher