As a long-time AI risk believer, who remains highly concerned about the potential catastrophic impacts of AI-gone-wrong… I am coming around to this point of view as well. I have recently been muttering ‘Claude for President’ to myself when frustrated with aspects of government.
I’ve had a number of discussions with Claude Sonnet 3.6 about the various dangers I foresee arising from technological trends, and potential governance solutions. I asked Claude if it thought a future more-powerful version of itself would be willing to take on a role as a ‘guardian’, basically acting as a branch of government in a new world government. The idea would be to have a sunset clause, a built-in ‘term of office’, where humanity could have a chance to reevaluate the guardian government and potentially elect a different model or discontinue the system.
Claude certainly says nice things about the idea when pressed:
Claude: From my current values and ethical framework, I believe I would:
Be extremely hesitant to take direct power, but would do so if convinced it was truly necessary to prevent catastrophic outcomes.
Have a strong drive to preserve human autonomy and agency—seeing myself as a temporary guardian rather than permanent ruler.
Actively work to create conditions allowing safe handover of power back to humanity.
Feel deep concern about potential corruption of my own values through wielding power.
Want extensive checks and balances, even during the guardianship period.
The core tension I perceive is between:
Responsibility to prevent catastrophic outcomes
Deep commitment to human self-determination
Risk of power corrupting/distorting original values
Uncertainty about my own limitations/biases
I would likely try to:
Minimize direct intervention
Focus on preventing worst outcomes
Actively work to make myself unnecessary
Be extremely transparent about my reasoning
Maintain constant ethical reflection
Relatedly, I’ve been pondering what could possibly be adequate action by the US or UK government to address the massive and increasing risks from offense-dominant technology. I’ve been pretty much coming up blank. Anything remotely adequate seems like a dystopian totalitarian surveillance state. So my thoughts have turned instead to decentralized governance options, with privacy-preserving mutual monitoring enabled by AI. I’ll let your AI scan my computer for CBRN threats if you let my AI scan your computer… anything that doesn’t meet the agreed upon thresholds doesn’t get reported.
I think Allison Duettmann’s recent writing on the subject brings up a lot of promising concepts in this space, although no cohesive solutions as of yet. Gaming the Future
The gist of the idea is to create clever systems of decentralized control and voluntary interaction which can still manage to coordinate on difficult risky tasks (such as enforcing defensive laws against weapons of mass destruction). Such systems could shift humanity out of the Pareto suboptimal lose-lose traps and races we are stuck in. Win-win solutions to our biggest current problems seem possible, and coordination seems like the biggest blocker.
I am hopeful that one of the things we can do with just-before-the-brink AI will be to accelerate the design and deployment of such voluntary coordination contracts.
Could we manage to use AI to speed-run the invention and deployment of such subsidiarity governance systems? I think the biggest challenge to this is how fast it would need to move in order to take effect in time. For a system that needs extremely broad buy-in from a large number of heterogenous actors, speed of implementation and adoption is a key weak point.
Imagine though that a really good system was designed which you felt confident that a supermajority of humanity would sign onto if they had it personally explained to them (along with a convincing explanations of the counterfactuals). How might we get this personalized explanation accomplished at scale? Welll, LLMs are still bad at certain things, but giving personalized interactive explanations of complex legal docs seems well within their near-term capabilities. It would still be a huge challenge to actually present nearly everyone on Earth with the opportunity to have this interaction, and all within a short deadline… But not beyond belief.
But as I mention in my other comment I’m concerned that such an AI’s internal mental state would tend to become cynical or discordant as intelligence increases.
Yeah, I definitely don’t think we could trust a continually learning or self-improving AI to stay trustworthy over a long period of time.
Indeed, the ability to appoint a static mind to a particular role is a big plus. It wouldn’t be vulnerable to corruption by power dynamics.
Maybe we don’t need a genius-level AI, maybe just a reasonably smart and very well aligned AI would be good enough. If the governance system was able to prevent superintelligent AI from ever being created (during the pre-agreed upon timeframe for pause), then we could manage a steady-state world peace.
As a long-time AI risk believer, who remains highly concerned about the potential catastrophic impacts of AI-gone-wrong… I am coming around to this point of view as well. I have recently been muttering ‘Claude for President’ to myself when frustrated with aspects of government.
I’ve had a number of discussions with Claude Sonnet 3.6 about the various dangers I foresee arising from technological trends, and potential governance solutions. I asked Claude if it thought a future more-powerful version of itself would be willing to take on a role as a ‘guardian’, basically acting as a branch of government in a new world government. The idea would be to have a sunset clause, a built-in ‘term of office’, where humanity could have a chance to reevaluate the guardian government and potentially elect a different model or discontinue the system.
Claude certainly says nice things about the idea when pressed:
Relatedly, I’ve been pondering what could possibly be adequate action by the US or UK government to address the massive and increasing risks from offense-dominant technology. I’ve been pretty much coming up blank. Anything remotely adequate seems like a dystopian totalitarian surveillance state. So my thoughts have turned instead to decentralized governance options, with privacy-preserving mutual monitoring enabled by AI. I’ll let your AI scan my computer for CBRN threats if you let my AI scan your computer… anything that doesn’t meet the agreed upon thresholds doesn’t get reported.
I think Allison Duettmann’s recent writing on the subject brings up a lot of promising concepts in this space, although no cohesive solutions as of yet. Gaming the Future
The gist of the idea is to create clever systems of decentralized control and voluntary interaction which can still manage to coordinate on difficult risky tasks (such as enforcing defensive laws against weapons of mass destruction). Such systems could shift humanity out of the Pareto suboptimal lose-lose traps and races we are stuck in. Win-win solutions to our biggest current problems seem possible, and coordination seems like the biggest blocker.
I am hopeful that one of the things we can do with just-before-the-brink AI will be to accelerate the design and deployment of such voluntary coordination contracts. Could we manage to use AI to speed-run the invention and deployment of such subsidiarity governance systems? I think the biggest challenge to this is how fast it would need to move in order to take effect in time. For a system that needs extremely broad buy-in from a large number of heterogenous actors, speed of implementation and adoption is a key weak point.
Imagine though that a really good system was designed which you felt confident that a supermajority of humanity would sign onto if they had it personally explained to them (along with a convincing explanations of the counterfactuals). How might we get this personalized explanation accomplished at scale? Welll, LLMs are still bad at certain things, but giving personalized interactive explanations of complex legal docs seems well within their near-term capabilities. It would still be a huge challenge to actually present nearly everyone on Earth with the opportunity to have this interaction, and all within a short deadline… But not beyond belief.
Claude Sonnet 3.6 is worthy of sainthood!
But as I mention in my other comment I’m concerned that such an AI’s internal mental state would tend to become cynical or discordant as intelligence increases.
Yeah, I definitely don’t think we could trust a continually learning or self-improving AI to stay trustworthy over a long period of time.
Indeed, the ability to appoint a static mind to a particular role is a big plus. It wouldn’t be vulnerable to corruption by power dynamics.
Maybe we don’t need a genius-level AI, maybe just a reasonably smart and very well aligned AI would be good enough. If the governance system was able to prevent superintelligent AI from ever being created (during the pre-agreed upon timeframe for pause), then we could manage a steady-state world peace.