The AI tools/epistemics space might provide a route to a sociotechnical victory, where instead of aiming for something like aligned ASI, we aim for making civilization coherent enough to not destroy itself while still keeping anchored to what’s good[1].
The core ideas are:
Basically nobody actually wants the world to end, so if we do that to ourselves, it will be because somewhere along the way we weren’t good enough at navigating collective action problems, institutional steering, and general epistemics
Conversely, there is some (potentially high) threshold of societal epistemics + coordination + institutional steering beyond which we can largely eliminate anthropogenic x-risk, potentially in perpetuity[2]
As AI gets more advanced, and therefore more risky, it will also unlock really radical advances in all these areas — genuinely unprecedented levels of coordination and sensible decision making, as well as the potential for narrow research automation in key fields like game theory and political science
I think these points are widely appreciated, but most people don’t seem to have really grappled with the implications — most centrally, that we should plausibly be aiming for a massive increase in collective reasoning and coordination as a core x-risk reduction strategy, potentially as an even higher priority than technical alignment.
Some advantages of this strategy:
We can make helpful incremental progress
We can do a lot of work unilaterally, by just building the relevant tools
Lots of the progress here is self-reinforcing
In the limit, this basically lets us sidestep the alignment problem for as long as necessary
Especially for international coordination and democratic oversight
There are some legitimate irreconcilable value differences
Some people really would rather throw the dice on getting radical boosts to science and healthcare via automated research soon enough that they/their families don’t pass away, even if that means incurring substantial misalignment risk
There are hard balances of power to navigate
For example we probably want to make governments much stronger in some ways (such that they can prevent vulnerable world / misuse dynamics), while also making democratic oversight much stronger (so we don’t end up in a stable dictatorship)
We might have to solve some kind of ‘civilizational alignment problem’ which could be difficult in similar ways to AI alignment
As a pointer, we are currently less than perfect at making institutions corrigible, doing scalable oversight on them, preventing mesa-optimisers from forming, and so on
The big implication in my mind is that it might be worth investing serious effort in mapping out what this coherent and capable enough society would look like, whether it’s even feasible, and what we’d need to do to get there.
(Such an effort is something that I and others are working up towards — so if you think this is wildly misguided, or if you feel particularly enthusiastic about this direction, I’d be keen to hear about it.)
Thanks to OCB, OS, and MD for helpful comments, and to many others I’ve discussed similar ideas with
The easy route to ‘coherent enough to not destroy itself’ is ‘controlled by a dictatorship/misaligned AI’, so the more nebulous ‘still anchored to the good’ part is I think the actual tricky bit
‘AI for societal uplift’ as a path to victory
The AI tools/epistemics space might provide a route to a sociotechnical victory, where instead of aiming for something like aligned ASI, we aim for making civilization coherent enough to not destroy itself while still keeping anchored to what’s good[1].
The core ideas are:
Basically nobody actually wants the world to end, so if we do that to ourselves, it will be because somewhere along the way we weren’t good enough at navigating collective action problems, institutional steering, and general epistemics
Conversely, there is some (potentially high) threshold of societal epistemics + coordination + institutional steering beyond which we can largely eliminate anthropogenic x-risk, potentially in perpetuity[2]
As AI gets more advanced, and therefore more risky, it will also unlock really radical advances in all these areas — genuinely unprecedented levels of coordination and sensible decision making, as well as the potential for narrow research automation in key fields like game theory and political science
I think these points are widely appreciated, but most people don’t seem to have really grappled with the implications — most centrally, that we should plausibly be aiming for a massive increase in collective reasoning and coordination as a core x-risk reduction strategy, potentially as an even higher priority than technical alignment.
Some advantages of this strategy:
We can make helpful incremental progress
We can do a lot of work unilaterally, by just building the relevant tools
Lots of the progress here is self-reinforcing
In the limit, this basically lets us sidestep the alignment problem for as long as necessary
Some challenges:
The bar for ‘good enough’ might be quite high
Adoption could be very difficult in some important cases
Especially for international coordination and democratic oversight
There are some legitimate irreconcilable value differences
Some people really would rather throw the dice on getting radical boosts to science and healthcare via automated research soon enough that they/their families don’t pass away, even if that means incurring substantial misalignment risk
There are hard balances of power to navigate
For example we probably want to make governments much stronger in some ways (such that they can prevent vulnerable world / misuse dynamics), while also making democratic oversight much stronger (so we don’t end up in a stable dictatorship)
We might have to solve some kind of ‘civilizational alignment problem’ which could be difficult in similar ways to AI alignment
As a pointer, we are currently less than perfect at making institutions corrigible, doing scalable oversight on them, preventing mesa-optimisers from forming, and so on
The big implication in my mind is that it might be worth investing serious effort in mapping out what this coherent and capable enough society would look like, whether it’s even feasible, and what we’d need to do to get there.
(Such an effort is something that I and others are working up towards — so if you think this is wildly misguided, or if you feel particularly enthusiastic about this direction, I’d be keen to hear about it.)
Thanks to OCB, OS, and MD for helpful comments, and to many others I’ve discussed similar ideas with
The easy route to ‘coherent enough to not destroy itself’ is ‘controlled by a dictatorship/misaligned AI’, so the more nebulous ‘still anchored to the good’ part is I think the actual tricky bit
Importantly this might include making fundamental advances in understanding what it even means for an institution to be steered by some set of values