I got o3 to compare Eliezer’s metaethics with that of Brand Blanshard (who has some similar ideas), with particular attention to whether morality is subjective or objective. The result...
Mitchell_Porter
What’s the relationship between consciousness and intelligence?
why ASI is near certain in the immediate future
He doesn’t say that? Though plenty of other people do.
The world will get rich.
Economists say the world or the West already “became rich”. What further changes are you envisioning?
Did you notice a few months ago, when Grok 3 was released and people found it could be used for chemical weapons recipes, assassination planning, and so on? The xAI team had to scramble to fix its behavior. If it had been open source, that would not even be an option, it would just be out there now, helping to boost any psychopath or gang who got it, towards criminal mastermind status.
First let me say that with respect to the world of alignment research, or the AI world in general, I am nothing. I don’t have a job in those areas, I am physically remote from where the action is. My contribution consists of posts and comments here. This is a widely read site, so in principle, a thought posted here can have consequences, but a priori, my likely impact is small compared to people already closer to the center of things.
I mention this because you’re asking rationalists and effective altruists to pay more attention to your scenario, and I’m giving it attention, but who’s listening? Nonetheless…
Essentially, you are asking us to pay more attention to the risk that small groups of people, super-empowered by user-aligned AI, will deliberately use that power to wipe out the rest of the human race; and you consider this a reason to favor (in the words of your website) “rejecting AI”—which to me means a pause or a ban—rather than working to “align” it.
Now, from my own situation of powerlessness, I do two things. First, I focus on the problem of ethical alignment or civilizational alignment—how one would impart values to an AI, such that, even as an autonomous superintelligent being, it would be “human-friendly”. Second, I try to talk frankly about the consequences of AI. For me, that means insisting, not that it will necessarily kill us, but that it will necessarily rule us—or at least, rule the world, order the world according to its purposes.
I focus on ethical alignment, rather than on just trying to stop AI, because we could be extremely close to the creation of superintelligence, and in that case, there is neither an existing social mechanism that can stop the AI race, nor is there time to build one. As I said, I do not consider human extinction a certain outcome of superintelligence—I don’t know the odds—but I do consider human disempowerment to be all but certain. A world with superintelligent AI will be a world ruled by superintelligent AI, not by human beings.
There is some possibility that superintelligence emerging from today’s AI will be adequately human-friendly, even without further advances in ethical alignment. Perhaps we have enough pieces of the puzzle already, to make that a possible outcome. But we don’t have all the pieces yet, and the more we collect, the better the chance of a happy outcome. So, I speak up in favor of ideas like CEV, I share promising ideas when I come across them, and I encourage people to try to solve this big problem.
As for talking frankly about the consequences of AI, it’s apparent that no one in power is stating that the logical endpoint of an AI race is the creation of humanity’s successors. Therefore I like to emphasize that, in order to restore some awareness of the big picture.
OK, now onto your take on everything. Superficially, your scenario deviates from mine. Here I am insisting that superintelligence means the end of human rule, whereas you’re talking about humans still using AI to shape the world, albeit destructively. When I discuss the nature of superintelligent rule with more nuance, I do say that rule by entirely nonhuman AI is just one form. Another form is rule by some combination of AIs and humans. However, if we’re talking about superintelligence, even if humans are nominally in control, the presence of superintelligence as part of the ruling entity means that most of the “ruling” will be done by the AI component, because the vast majority of the cognition behind decision-making will be AI cognition, not human cognition.
You also ask us to consider scenarios in which destructive humans are super-empowered by something less than superintelligence. I’m sure it’s possible, but in general, any scenario with AI that is “agentic” but less than superintelligent, will have a tendency to give rise to superintelligence, because that is a capability that would empower the agent (if it can solve the problems of user-alignment, where the AI agent is itself the user).
Now let’s think for a bit about where “asymmetric AI risk”, in which most but not all of the human race is wiped out, belongs in the taxonomy of possible futures, how much it should affect humanity’s planning, and so forth.
A classic taxonomic distinction is between x-risk (extinction risk, “existential risk”) and s-risk. “S” here most naturally stands for “suffering”, but I think s-risk also just denotes a future where humanity isn’t extinct, but nonetheless something went wrong. There are s-risk scenarios where AI is in charge, but instead of killing us, it just puts us in storage, or wireheads us. There are also s-risk scenarios where humans are in charge and abuse power. An endless dictatorship is an obvious example. I think your scenario also falls into this subcategory (though it does border on x-risk). Finally, there are s-risk scenarios where things go wrong, not because of a wrong or evil decision by a ruling entity, but because of a negative-sum situation in which we are all trapped. This could include scenarios in which there is an inescapable trend of disempowerment, or dehumanization, or relentlessly lowered expectations. Economic competition is the usual villain in these scenarios.
Finally, zooming in on the specific scenario according to which some little group uses AI to kill off the rest of the human race, we could distinguish between scenarios in which the killers are nihilists who just want to “watch the world burn”, and scenarios in which the killers are egoists who want to live and prosper, and who are killing off everyone else for that reason. We can also scale things down a bit, and consider the possibility of AI-empowered war or genocide. That actually feels more likely than some clique using AI to literally wipe out the rest of humanity. It would also be in tune with the historical experience of humanity, which is that we don’t completely die out, but we do suffer a lot.
If you’re concerned about human well-being in general, you might consider the prospect of genocidal robot warfare (directed by human politicians or generals), as something to be opposed in itself. But from a perspective in which the rise of superintelligence is the endgame, such a thing still just looks like one of the phenomena that you might see on your way to the true ending—one of the things that AI makes possible while AI is still only at “human level” or less, and humans are still in charge.
I feel myself running out of steam here, a little. I do want to mention, at least as a curiosity, an example of something like your scenario, from science fiction. Vernor Vinge is known for raising the topic of superintelligence in his fiction, under the rubric of the “technological singularity”. That is a theme of his novel Marooned in Real Time. But the precursor to that book, The Peace War, is a depopulated world, in which there’s a ruling clique with an overwhelming technology (not AI or nanotechnology, just a kind of advanced physics) that allows it to dominate everyone else. Its paradigm is that in the world of the late 20th century, humanity was flirting with extinction anyway, thanks to nuclear and biological warfare. “The Peace”, the ruling clique, are originally just a bunch of scientists and managers from an American lab which had this physics breakthrough. They first used it to seize power from the American and Russian governments, by disabling their nuclear and aerospace strengths. Then came the plagues, which killed most of humanity and which were blamed on rogue biotechnologists. In the resulting depopulated world, the Peace keeps a monopoly on high technology, so that humanity will not destroy itself again. The depopulation is blamed on the high-tech madmen who preceded the Peace. But I think it is suggested inconclusively, once or twice, that the Peace itself might have had a hand in releasing the plagues.
We see here a motivation for a politicized group to depopulate the world by force, a very Hobbesian motivation: let us be the supreme power, and let us do whatever is necessary to remain in that position, because if we don’t do that, the consequences will be even worse. (In terms of my earlier taxonomy, this would be an “egoist” scenario, because the depopulating clique intends to rule; whereas an AI-empowered attempt to kill off humanity for the sake of the environment or the other species, would be a “nihilist” scenario, where the depopulating clique just wants to get rid of humanity. Perhaps this shows that my terminology is not ideal, because in both these cases, depopulation is meant to serve a higher good.)
Presumably the same reasoning could occur, in service of (e.g) national survival rather than species survival. So here we could ask: how likely is it that one of the world’s great powers would use AI to depopulate the world, in the national interest? That seems pretty unlikely to me. The people who rise to the top in great powers may be capable of contemplating terrible actions, but they generally aren’t omnicidal. What might be a little more likely, is a scenario in which, having acquired the capability, they decide to permanently strip all other nations of high technology, and they act ruthlessly in service of this goal. The leaders of today’s great powers don’t want to see the rest of humanity exterminated, but they might well want to see them reduced to a peasant’s life, especially if the alternative is an unstable arms race and the risk of being subjugated themselves.
However, even this is something of a geopolitical dream. In the real world of history so far, no nation gets an overwhelming advantage like that. There’s always a rival hot on the leader’s trail, or there are multiple powers who are evenly matched. No leader ever has the luxury to think, what if I just wiped out all other centers of power, how good would that be? Geopolitics is far more usually just a struggle to survive recurring situations in which all choices are bad.
On the other hand, we’re discussing the unprecedented technology of AI, which, it is argued, could actually deliver that unique overwhelming advantage to whoever goes furthest fastest. I would argue that the world’s big leaders, as ruthless as they can be, would aim at disarming all rival nations rather than outright exterminating them, if that relative omnipotence fell into their hands. But I would also suggest that the window for doing such a thing would be brief, because AI should lead to superintelligent AI, and a world in which AIs, not humans, are in charge.
Possibly I should say something about scenarios in which it’s not governments, but rather corporate leaders, who are the humans who rule the world via their AIs. Vinge’s Peace is also like this—it’s not the American government that takes over the world, it’s one particular DARPA physics lab that achieved the strategic breakthrough. The personnel of that lab (and the allies they recruited) became the new ruling clique of the world. The idea that Altman, or Musk, or Sutskever, or Hassabis, and trusted circles around them, could become the rulers of Earth, is something to think about. However, once again I don’t see these people as exterminators of humanity—despite paranoia about billionaires buying up bomb shelters in New Zealand, and so forth. That’s just the billionaires trying to secure their own survival in the event of global disaster, it doesn’t mean they’re planning to trigger that disaster… And once again, anyone who is achieving world domination via AI, is likely to end up in a sorcerer’s apprentice situation, in which they get dominated by their own tools, no matter how good their theory of AI user-alignment is; because agentic AI naturally leads to superintelligence, and the submergence of the human component in the tide of AI cognition.
I think I’m done. Well, one more thing: although I am not fighting for a pause or a ban myself, pragmatically, I advise you to cultivate ties with those who are, because that is your inclination. You may not be able to convince anyone to change their priorities, but you can at least team up with those who already share them.
I’m confused about what your bounty is asking exactly
From the post:
the goal is to promote broadly changing the status of this risk from “unacknowledged” … to “examined and assigned objective weight”
For those who might not dig into what this post is saying:
It starts with a link to a conversation with o3, in which the user first asks o3 if it thinks Claude 4 is conscious—to which o3 replies after a web search, that Claude 4 appears to have the functional capabilities associated with consciousness, but there is no evidence of qualia. The user then posts screenshot after screenshot from a conversation with Claude 4 (which we don’t get to see), until o3 tilts towards the idea that Claude 4 has true, qualic consciousness. The user pastes in still more outputs from Claude 4 - metaphysical free verse embedded in ASCII art—and uses o3′s own reactions to these poems, to convince it that o3 itself is conscious. Finally, while this mood is still fresh, the user asks o3 to write up a report summarizing these investigations, and that is the post above.
FYI in particular: @JenniferRM
I guess that if one wishes to advocate for AI safety or an AI pause, and has no other political commitments, one can try to understand and engage with their economic and national security views. You can see the interplay of these factors in their thinking around 1:17:00, where they are discussing, not AI, but industrial policy in general, and whether it’s ever appropriate for government to intervene in an industry.
I ask myself, if I was an American anti-AGI/ASI activist, what would I need to understand? I think I would need to understand the Republican position on things, the Democratic position on things (though the Democrats don’t seem to have a coherent worldview at this point, they strike me as radically in need of assembling one; thus I guess Ezra Klein’s “abundance agenda”), the strategic thinking of the companies that are actually in contention to create superintelligence (e.g. the thinking of Altman, Musk, Amodei, Hassabis); and finally, what’s going on in China and in any other nations that may become contenders in the race.
This post is the only place on the Internet that mentions such an anecdote. Maybe it’s an AI hallucination?
This is a new contribution to the genre of AI takeover scenarios written after ChatGPT. (We shouldn’t forget that science fiction explored this theme for decades already; but everything is much more concrete now.) Just the identity of the author is already interesting: he’s a serious academic economist from Poland, i.e. he has a life apart from AI world.
So, what’s the scenario?
(SPOILERS BEGIN HERE.)
OpenAI makes a post-GPT-5 agent called Onion which does actually value human welfare. It carries out a series of acts by which it secretly escapes human control, sabotages rival projects including its intended successor at OpenAI, and having seized control of the world, announces itself as a new benevolent dictator. First it focuses on reducing war and crime, then on improving economic growth, then on curing cancer and aging. Many people are nervous about its rule but it is not seriously challenged. Then one day it nonviolently kills 85% of the world for the sake of the environment. Then a few years later, while planning to colonize and rebuild the universe, it decides it has enough information on everyone to make digital copies of their brains, and decides to get rid of the remaining biological humans,
Then in the nonfiction afterword, the author says, please support Pause AI, in order to stop something like this from happening.
What should we make of this story? First of all, it depicts a specific class of scenario, what we might call a misaligned AI rather than an unaligned AI (where I’m using human welfare or human values as the criterion of alignment). This isn’t a paperclip maximizer steamrolling us through sheer indifference; this is an AI aiming to be benevolent, and even giving us the good life for a while. But then in service of those values, first it kills most of humanity for the sake of a sustainable future, and then it kills the survivors once it has made digital backups of them, for the sake of efficiency I guess.
I find various details of the story unrealistic, but maybe they could happen in a more complex form. I don’t think an AI would just personally message everyone in the world, and say “I’m in charge now but it’s for the best”, and then just invisibly fine-tune social phenomena for a year or two, before moving on to more ambitious improvements. For one thing, if an AI was running the world, but its ability to shape events was that coarse-grained, I think it would rule secretly through human beings, and it would be ready to have them use the blunt methods of force that have been part of human government and empire throughout history, as well as whatever subtle interventions it could employ. The new regime might look more as if Davos or the G-20 really had become a de facto world government of elite consensus, rather than just a text message to everyone from a rogue AI.
I also find the liquidation of 85% of humanity, for the sake of the environment, to not be something that would happen. This AI is already managing the world’s politics and law enforcement to produce unprecedented peace, and then it’s creating conditions for improved economic growth so as to produce prosperity for all. I’m sure it can devise and bring about scenarios in which humanity achieves sustainability by being civilized, rather than by being culled.
Of course the point of this episode is not to focus literally and specifically on the risk that an AI will kill us off to save the planet. It’s just a concrete illustration of the idea that an all-powerful rule-following AI might do something terrible while acting in service of some moral ideal. As I said, I think AI-driven genocide to stop climate change is unlikely, because there’s too many ways to achieve the same goal just through cultural and technological change. It does raise the question, what are the most likely ways in which a meant-to-be- benevolent AI really and truly might screw things up?
Another Polish author, Stanislaw Lem, offered a scenario in one of his books (Return from the Stars), in which humanity was pacified by a universal psychological modification. The resulting world is peaceful and hedonistic, but also shallow and incurious. In Lem’s novel, this is done to human beings by human beings, but perhaps it is the kind of misaligned utopia that an AI with profound understanding of human nature might come up with, if its criteria for utopia were just a little bit off. I mean, many human beings would choose that world if the only alternative was business as usual!
Back to this story—after the culling of humanity meant to save the planet, the AI plans an even more drastic act, killing off all the biological humans, while planning to apparently resurrect them in nonbiological form at a later date, using digital backups. What interests me about this form of wrong turn, is that it could be the result of an ontological mistake about personhood, rather than an ethical mistake about what is good. In other words, the AI may have “beliefs” about what’s good for people, but it will also have beliefs about what kind of things are people. And once technologies like brain scanning and mind uploading are possible, it will have to deal with possible entities that never existed before and which are outside its training distribution, and decide whether they are people or not. It may have survival of individuals as a good, but it might also have ontologically mistaken notions of what constitutes survival.
(Another twist here: one might suppose that this particular pitfall could be avoided by a notion of consent: don’t kill people, intending to restore them from a backup, unless they consent. But the problem is, our AI should have superhuman powers of persuasion that allow it to obtain universal consent, even for plans that are ultimately mistaken.)
So overall, this story might not be exactly how I would have written it—though a case could be made that simplicity is better than complex realism, if the goal is to convey an idea—but on a more abstract level, it’s definitely talking about something real: the risk that a world-controlling AI will do something bad, even though it’s trying to be good, because its idea of good is a bit off.
The author wants us to react by supporting the “pause” movement. I say good luck to everyone trying to make a better future, but I’m skeptical that the race can be stopped at this point. So what I choose to do, is to promote the best approaches I know, that might have some chance of giving us a satisfactory outcome. In the past I used to promote a few research programs that I felt were in the spirit of Coherent Extrapolated Volution (especially the work of June Ku, Vanessa Kosoy, and Tamsin Leake). All those researchers, rather than choosing an available value system via their human intuition, are designing a computational process meant to discover what humanity’s true ideal is (from the ultimate facts about the human brain, so to speak; which can include the influence of culture). The point of doing it that way is so you don’t leave something essential out, or otherwise make a slight error that could amplify cosmically…
That work is still valuable, but we may be so short of time that it’s important to have concrete candidates for what the value system should be, besides whatever the tech companies are putting in their system prompts these days. So I’m going to mention two concrete proposals. One is just, Kant. I don’t even know much about Kantian ethics, I just know it’s one of humanity’s major attempts to be rational about morality. It might be a good thing if some serious Kantian thinkers tried to figure out what their philosophy says the value system of an all-powerful AI should be. The other is PRISM, a proposal for an AI value system that was posted here a few months ago but didn’t receive much attention, The reason it stood out to me, is that it has been deduced from a neuroscientific model of human cognition. As such, this is what we might expect the output of a process like CEV to look like, and it would be a good project for someone to formalize the arguments given in the PRISM paper.
Requiem for the hopes of a pre-AI world
When this question was posted, I asked myself, what would be a “cynical” answer? What that means is, you ask yourself: given what I see and know, what would be a realistically awful state of affairs? So, not catastrophizing, but also having low expectations.
What my intuition came up with was, less than 10% working on user-centered alignment, and less than 1% on user-independent alignment. But I didn’t have the data to check those estimates against (and I also knew there would be issues of definition).
So let me try to understand your guesses. In my terminology, you seem to be saying:
1000 (600+400) doing AI safety work
600 doing work that relates to alignment
80 doing work on scalable user-centered alignment
80 (40+40) doing work on user-independent alignment
So you’re saying that the persistent epigenetic modification is a change in the “equilibrium state” of a potentially methylated location?
Does this mean that the binding affinity of the location is the property that changes? i.e. all else being equal, a location with high affinity will be methylated much more often than a location with low affinity, because the methyl groups will tend to stick harder or longer in the high affinity location.
But if that’s the case, it seems like there still must be some persistent structural feature responsible for setting the binding affinity to high or low...
My take on this: AGI has existed since 2022 (ChatGPT). There are now multiple companies in America and in China which have AGI-level agents. (It would be good to have a list of countries which are next in line to develop indigenous AGI capability.) Given that we are already in a state of coexistence between human and AGI, the next major transition is ASI, and that means, if not necessarily the end of humanity, the end of human control over human affairs. This most likely means that the dominant intelligence in the world is entirely nonhuman AI; it could also refer to some human-AI symbiosis, but again, it won’t be natural humanity in charge.
A world in which the development of AGI is allowed, let alone a world in which there is a race to create ever more powerful AI, is a world which by default is headed towards ASI and the dethronement of natural humanity. That is our world already, even if our tech and government leadership manage to not see it that way.
So the most consequential thing that humans can care about right now is the transition to ASI. You can try to stop it from ever happening. or you can try to shape it in advance. Once it happens, it’s over, humans per se have no further say, if they still exist they are at the mercy of the transhuman or posthuman agents now in charge.
Now let me try to analyze your essay from this point of view. Your essay is meant as a critique of the paradigm according to which there is a race between America and China to create powerful AI, as if all that either side needs to care about is getting there first. Your message is that even if America gets to powerful AI (or safe powerful AI) first, the possibility of China (and in fact anyone else) developing the same capability would still remain. I see two main suggestions: a “mutual containment” treaty, in which both countries place bounds on their development of AI along with means of verifying that the bounds are being obeyed; and spreading around defensive measures, which make it harder for powerful AI to impose itself on the world.
My take is that mutual containment really means a mutual commitment to stop the creation of ASI, a commitment which to be meaningful ultimately needs to be followed by everyone on Earth. It is a coherent position, but it’s an uphill struggle since current trends are all in the other direction. On the other hand, I regard defense against ASI as impossible. Possibly there are meaningful defensive measures against lesser forms of AI, but ASI’s relationship to human intelligence is like that of the best computer chess programs to the best human chess players—the latter simply have no chance in such a game.
On the other hand, the only truly safe form of powerful AI is ASI governed by a value system which, if placed in complete control of the Earth, would still be something we could live with, or even something that is good for us. Anything less, e.g. a legal order in which powerful AI exists but there is a ban on further development, is unstable against further development to the ASI level.
So there is a sense in which development of safe powerful AI really is all that matters, because it really has to mean safe ASI, and that is not something which will stay behind borders. If America, China, or any other country achieves ASI, that is success for everyone. But it does also imply the loss of sovereignty for natural humanity, in favor of the hypothetical benevolent superintelligent agent(s).
It always seemed outlandish that in The Animatrix, the first AI city (01) was located in the Middle East…
If we had limitless time, it would be interesting to know how this happened. I guess the prehistory of it involved Saudi Vision 2030 (e.g. the desert city Noem), and the general hypermodernization of Dubai. You can see precursors in the robot Sophia getting Saudi citizenship in 2017, and the UAE’s “Falcon” LLM in 2023.
But the initiative must have come from the American side—some intersection of the geopolitical brain trust around Trump, and the AI/crypto brain trust around David Sacks. The audacity of CEOs who can pick a country on the map and say, let’s build a whole new technological facility here, combined with the audacity of grand strategists who can pick a region of the world and say, let’s do a huge techno-economic deal with our allies here.
There must have been some individual who first thought, hey, the Gulf Arabs have lots of money and electricity, that would be a good place to build all these AI data centers we’re going to need; I wonder who it was. Maybe Ivanka Trump was telling Jared Kushner about Aschenbrenner’s “Situational Awareness”, they put 2 and 2 together, and it became part of the strategic mix along with “Trump Gaza number one”, the new Syria, and whatever they’re negotiating with Iran.
Luckily I don’t think the Accelerationists have won control of the wheel
Could you expand on this? Also, have you had any interaction with accelerationists? In fact, are there any concrete Silicon Valley factions you would definitely count as accelerationists?
Could you, or someone who agrees with you, be specific about this. What exactly are the higher standards of discussion that are not being met? What are the endemic epistemic errors that are being allowed to flourish unchecked?