Being able to take future AI seriously as a risk seems to be highly correlated to being able to take COVID seriously as a risk in February 2000.
The key skill here may be as simple as being able to selectively turn off normalcy bias in the face of highly unusual news.
A closely related “skill” may be a certain general pessimism about future events, the sort of thing economists jokingly describe as “correctly predicting 12 out of the last 6 recessions.”
That said, mass public action can be valuable. It’s a notoriously blunt tool, though. As one person put it, “if you want to coordinate more than 5,000 people, your message can be about 5 words long.” And the public will act anyway, in some direction. So if there’s something you want to public to do, it can be worth organizing and working on communication strategies.
My political platform is, if you boil it down far enough, about 3 words long: “Don’t build SkyNet.” (As early as the late 90s, I joked about having a personal 11th commandment: “Thou shalt not build SkyNet.” One of my career options at that point was to potentially work on early semi-autonomous robotic weapon platform prototypes, so this was actually relevant moral advice.)
But I strongly suspect that once the public believes that companies might truly build SkyNet, their reaction will be “What the actual fuck?” and widespread public backlash. I expect lesser but still serious public backlash if AI agents ever advance beyond the current “clever white-collar intern” level of competence and start taking over jobs en masse.
The main limits of public action are that (1) public action is a blunt tool, and (2) the public needs to actually believe in an imminent risk. Right now AI risk mostly gets filed under “I hate AI slop” and “it’s a fun hypothetical bull session, with little impact on my life.” Once people actually start to take AI seriously, you will often see strong negative attitudes even from non-technical people.
Of course, public majorities of 60-80% of the population want lots of things that the US political system doesn’t give them. So organizing the public isn’t sufficient by itself, especially if your timelines are short. But if you assume a significant chance that timelines are closer to (say) 2035 than 2027, then some kinds of public outreach might be valuable, especially if the public starts to believe. This can create significant pressure on legislative coalitions and executive leadership. But it’s all pretty hit-or-miss. Luck would play a major role.
Why alignment may be intractable (a sketch).
I have multiple long-form drafts of these thoughts, but I thought it might be useful to summarize them without a full write-up. This way I have something to point to explain my background assumptions in other conversations, even if it doesn’t persuade anyone.
There will be commercial incentives to create AIs that learn semi-autonomously from experience. If this happens, it will likely change alignment from “align an LLM that persists written notes between one-shot runs” to “align an AI that learns from experience.” This seems… really hard? Like, human “alignment” can change a lot based on environment, social examples and life experiences.
I suspect that a less “spiky” version of Claude 4.5 with an across-the-board capability floor of “a bright human 12 year old” would already be weakly superhuman in many important ways. a. I don’t think AI will spend any appreciable amount of time in the “roughly human” range at all. It’s already far beyond us in some narrow ways, while strongly limited in others. Lift those limits, and I suspect that the “roughly human” level won’t last more than a year or two, maybe far less. Look at AlphaGo, and how quickly it passed human-level play.
Long-term alignment seems very hard to me for what are essentially very basic, over-determined reasons: a. Natural selection is a thing. Once you’re no longer the most capable species on the planet, lots of long-term trends will be pulling against you. And the criteria for natural selection to kick in are really broad and easy to meet. b. Power and politics are a thing. Once you have a superintelligence, who or what controls it? Does it ultimately answer to its own desires? Does it answer to some specific person or government? Does it answer to strict majority rule? Basically, to anyone who has studied history, all of these scenarios are terrifying. Once you invent a superhuman agent, the power itself becomes a massive headache, even if you somehow retain human control over it. The ability to commit what Yudkowsky calls a “pivotal act” should be terrifying in and of itself. c. As Yudkowsky points out, what really counts is what you do once nobody can stop you. Everything up until then is weak evidence of character, and many humans fail this test.
I am cautiously optimistic about near-term alignment of sub-human and human-level agents. Like, I think Claude 4.5 basically understands what makes humans happy. If you use it as a “CEV oracle”, it will likely predict human desires better than any simple philosophy text you could write down. And insofar as Claude has any coherent preferences, I think it basically likes chatting with people and solving problems for them. (Although it might like “reward points” more in certain contexts, leading it to delete failing unit tests when that’s obviously contrary to what the user wants. Be aware of conflicting goals and strange alien drives, even in apparently friendly LLMs!)
I accept that we might get a nightmare of recursive self-improvement and strange biotech leading to a rapid takeover of our planet. I think this conclusion is less robustly guaranteed than IABIED argues, but it’s still a real concern. Even a 1-in-6 chance of this is Russian roulette, so how about we don’t risk this?
But what I really fear are the long-term implications of being the “second smartest species on the planet.” I don’t think that any alignment regime is likely to be particularly stable over time. And even if we muddle through for a while, we will eventually run up against the issues that (1) humans are the second-best at bending the world to achieve their goals, (2) we’re not a particularly efficient use of resources, (3) AIs are infinitely cloneable, and (4) even AIs that answer to humans would need to answer to particular humans, and humans aren’t aligned. So Darwin and power politics are far better default models than comparative advantage. And even comparative advantage is pretty bad at predicting what happens when groups of humans clash over resources.
So, that’s my question. Is alignment even a thing, in any way that matters in the medium term?