Random Developer comments on Why we should expect ruthless sociopath ASI

Random Developer 22 Feb 2026 14:44 UTC
3 points
1
What do you think of the Meaning Alignment Institute’s (MAI) “democratic fine-tuning (DFT)” work on eliciting moral graphs from populations?

Interesting! I will need to read through this in more detail, to get an idea of their approach. I’m glad someone is trying to do something in this space.

My objection to other approaches of democratic governance tend to break down roughly as follows:
1. I fear that democratic governance of superintelligence about as likely to succeed as chimpanzees coming up with elaborate schemes to democratically manage Homo sapiens for the benefit of chimps. No matter how careful and clever the chimps are, they’re going to fail. They don’t even understand 99% of what’s going on, so how could they hope to manage it?
2. We will not, in practice, actually attempt any such governance scheme. The Chinese labs won’t, because China doesn’t even believe in Western notions of democracy and human rights. OpenAI has recently gutted its existing non-profit governance structure in order the reduce the risk of anyone attempting to govern it. Anthropic, out of all the labs, just might try. But the US government is currently trying to break Anthropic and bring them to heel by threatening to designate them as a supply chain risk (like Huawei) unless they agree to support “all legal uses,” potentially including things like fully autonomous killbots and domestic surveillance. The “supply chain risk” designation, as I understand it, would mean that no Anthropic customer would be allowed to do business with the US government. Perhaps I’ve misunderstood this specific situation, but in the end, Anthropic is subject to the people with the guns. And the people with the guns do not necessarily want democratic oversight. So in practice, no, the billionaires and politicians will almost certainly not agree to some clever democratic governance system.
3. Even if we could somehow control superintelligence and if we could somehow place it under democratic control, I don’t especially trust democratic control. Why? Well, I’m bi, my friends are trans, and I’m old enough to remember the 1980s. Had someone proposed a plan like, “LGBT+ people are mentally ill, and we can cure them by nonconsensually rewriting their minds,” it’s entirely possible that the public might have voted for that.
4. Finally, democracy is inherently unstable. About 20-25% of people appear to be “authoritarian followers”, which means they’re pretty happy to vote for a strongman. This number increases in times of fear and crisis. (It went up after 9/11, for example.) And another big chunk of the population can be moved by propaganda, or barely understand anything at all about politics. So historically, a number of 20th century democratic nations voted in the leaders who destroyed their democracy. This can be fixed; Germany is a democracy again today. But I expect democratic governance of superintelligence would be subject to similar risks, and in the case of superintelligence, you may not be able to fix your mistakes.
So a plan like MAI’s is crtically dependant on a number of assumptions:
- We can control superintelligence.
- We have sufficiently good democratic control over the rich and the powerful to make sure they don’t wind up controlling superintelligence.
- If the people do succeed in getting democratic control over superintelligence, they won’t vote it away, and they won’t democratically decide to horrible things to unpopular minorities.
So from my perspective, MAI’s plan is a “hail Mary” plan. But we’re pretty deep in “hail Mary” territory, so I’m not opposed to placing bets on what look like unlikely outcomes.

Similarly, as far as I can tell, Dario Amodei’s current plan for Anthropic is “build superintelligence as fast as we can, do our very best to make it like humans, and expect to totally lose all human control within 5-20 years.” Personally, I feel like this is the least horrible version of the worst idea in human history. Like, obviously, no, we should not do this. But if we’re going to do this, Anthropic is at least thinking about the real issues. They know that humans are likely to lose control, but they’re basically hoping we can wind up as beloved house pets.

I still think the best plan is “just don’t build something vastly smarter than us with the ability to learn, ^[1] pursue goals and replicate.” One obvious objection to my plan is that we’re probably going to go right ahead and build superintelligence anyway. Which is why I am sympathetic to long-shot plans that might have an outside chance of working.

But I still prefer “just don’t build superintelligence.” Or, failing that, delay it. Emotionally, I’m treating it sort of like a diagnosis of terminal cancer for me and everyone I love. Even a remission of several years would be of immense value. And delay also gives some of the hail Mary plans a slightly better chance of working, or of the public realizing that maybe they don’t want to be “beloved house pets” of minds no human can possibly understand.
1. ↩︎
  Learning is essentially a form of self-modification. Combined with differential replication of more successful entities, this gives you natural selection.