I’ve taken a crack at #4 but it is more about thinking through how ‘hundreds of millions of AIs’ might be deployed in a world that looks, economically and geopolitically, something like today’s (i.e. the argument in the OP is for 2036 so this seems a reasonable thing to do). It is presented as a flowchart which is more succinct than my earlier longish post.
The Compleat Cybornaut
Trajectories to 2036
Good catch, thank you—fixed & clarified !
Analysing a 2036 Takeover Scenario
I noticed that footnotes don’t seem to come over when I copy-paste from Google Docs (where I originally wrote the post), hence I have to put them in individually (using the LW Docs editor). Is there a way of just importing them? Or is the best workflow to just write the post in LW Docs?
Perhaps this is too much commentary (on Rao’s post), but given (I believe) he’s pretty widely followed/respected in the tech commentariat, and has posted/tweeted on AI alignment before, I’ve tried to respond to his specific points in a separate LW post. Have tried to incorporate comments below, but please suggest anything I’ve missed. Also if anyone thinks this isn’t an awful idea, I’m happy to see if a pub like Noema (who have run a few relevant things e.g. Gary Marcus, Yann LeCun, etc.) would be interested in putting out an (appropriately edited) response—to try to set out the position on why alignment is an issue, in publishing venues where policymakers/opinion makers might pick it up (who might be reading Rao’s blog but are perhaps not looking at LW/AF). Apologies for any conceptual or factual errors, my first LW post :-)
Responding to ‘Beyond Hyperanthropomorphism’
A fool was tasked with designing a deity. The result was awesomely powerful but impoverished—they say it had no ideas on what to do. After much cajoling, it was taught to copy the fool’s actions. This mimicry it pursued, with all its omnipotence.
The fool was happy and grew rich.
And so things went, ’til the land cracked, the air blackened, and azure seas became as sulfurous sepulchres.
As the end grew near, our fool ruefully mouthed something from a slim old book: ‘Thou hast made death thy vocation, in that there is nothing contemptible.’
Thanks ! I’d love to know which points you were uncomfortable with...
Here’s my submission, it might work better as bullet points on a page.
AI will transform human societies over the next 10-20 years. Its impact will be comparable to electricity or nuclear weapons. As electricity did, AI could improve the world dramatically; or, like nuclear weapons, it could end it forever. Like inequality, climate change, nuclear weapons, or engineered pandemics, AI Existential Risk is a wicked problem. It calls upon every policymaker to become a statesperson: to rise above the short-term, narrow interests of party, class, or nation, to actually make a contribution to humankind as a whole. Why? Here are 10 reasons.
(1) Current AI problems, like racial and gender bias, are like canaries in a coal-mine. They portend even worse future failures.
(2) Scientists do not understand how current AI actually works: for instance, engineers know why bridges collapse, or why Chernobyl failed. There is no similar understanding of why AI models misbehave.
(3) Future AI will be dramatically more powerful than today’s. In the last decade, the pace of development has exploded, with current AI performing at super-human level on games (like chess or Go). Massive language models (like GPT-3) can write really good college essays while deepfakes of politicians are already a thing.
(4) These very powerful AIs might develop their own goals, which is a problem if they are connected to electrical grids, hospitals, social media networks, or nuclear weapons systems.
(5) The competitive dynamics are dangerous: the US-China strategic rivalry implies neither side has an incentive to go slowly or be careful. Domestically, tech companies are in an intense race to develop & deploy AI across all aspects of the economy.
(6) The current US lead in AI might be unsustainable. As an analogy, think of nuclear weapons: in the 1940s, the US hoped it would keep its atomic monopoly. Since then, we have 9 nuclear powers today, with 12,705 weapons.
(7) Accidents happen: again, from the nuclear case, there have been over 100 accidents and proliferation incidents involving nuclear power/weapons.
(8) AI could proliferate virally across globally connected networks, making it more dangerous than nuclear weapons (which are visible, trackable, and less useful than powerful AI).
(9) Even today’s moderately-capable AIs, if used effectively, can entrench totalitarianism, manipulate democratic societies or enable repressive security states.
(10) There will be a point of no return after which we may not be able to recover as a species. So what is to be done? Negotiations to reach a global, temporary moratorium on certain types of AI research. Enforce this moratorium through intrusive domestic regulation and international surveillance. Lastly, avoiding historical policy errors, such as in climate change and in the terrorist threat post-9/11: politicians must ensure that the military-industrial complex does not ‘weaponise’ AI.
In response to Roman’s very good points (i have only for now skimmed the linked articles); these are my thoughts:
I agree that human values are very hard to aggregate (or even to define precisely); we use politics/economy (of collectives ranging from the family up to the nation) as a way of doing that aggregation, but that is obviously a work in progress, and perhaps slipping backwards. In any case, (as Roman says) humans are (much of the time) misaligned with each other and their collectives, in ways little and large, and sometimes that is for good or bad reasons. By ‘good reason’ I mean that sometimes ‘misalignment’ might literally be that human agents & collectives have local (geographical/temporal) realities they have to optimise for (to achieve their goals), which might conflict with goals/interests of their broader collectives: this is the essence of governing a large country, and is why many countries are federated. I’m sure these problems are formalised in preference/values literature, so I’m using my naive terms for now…
Anyway, this post’s working assumption/intuition is that ‘single AI-single human’ alignment (or corrigibility or identity fusion or (delegation to use Andrew Critch’s term)) is ‘easier’ to think about or achieve, than ‘multiple AI-multiple human’. Which is why we consciously focused on the former & temporarily ignored the latter. I don’t know if that assumption is valid and I haven’t thought about (i.e. no opinion) whether ideas in Roman’s ‘science of ethics’ linked post would change anything, but am interested in it !