The Possessed Machines (summary)

Link post

The Possessed Machines is one of the most important AI microsites. It was published anonymously by an ex- lab employee, and does not seem to have spread very far, likely at least partly due to this anonymity (e.g. there is no LessWrong discussion at the time I’m posting this). This post is my attempt to fix that.

(The piece was likely substantially human-directed but laundered through an AI due to anonymity or laziness. Thanks to Malcolm MacLeod for reminding me to mention this in the comments. See here for Pangram-on-X analysis claiming 67.5% AI. The prose is not its strength.)

I do not agree with everything in the piece, but I think cultural critiques of the “AGI uniparty” are vastly undersupplied and incredibly important in modeling & fixing the current trajectory.

The piece is a long but worthwhile analysis of some of the cultural and psychological failures of the AGI industry. The frame is Dostoevsky’s Demons (alternatively translated The Possessed), a novel about ruin in a small provincial town. The author argues it’s best read as a detailed description of earnest people causing a catastrophe by following tracks laid down by the surrounding culture that have gotten corrupted:

What I know is that Dostoevsky, looking at his own time, saw something true about how intelligent societies destroy themselves. He saw that the destruction comes from the best as well as the worst, from the idealists as well as the cynics, from the people who believe they are saving humanity as well as those who want to burn it down.

The piece is rich in good shorthands for important concepts, many taken from Dostoevsky, which I try to summarize below.

First: how to generalize from fictional evidence, correctly

The author argues for literature as a source of limited but valuable insight into questions of culture and moral intuition:

Literature cannot tell us what to do. It cannot provide policy prescriptions or technical solutions. It cannot predict the future or settle empirical questions. The person who reads Dostoevsky looking for an alignment technique will be disappointed.

What literature can do is reshape perception. It can make visible patterns that were invisible, make felt truths that were merely known, make urgent realities that were abstract. It can serve as a kind of training data for moral intuition—presenting scenarios that expand the range of situations one has “experienced” and therefore the range of situations one can respond to wisely.

[...]

Dostoevsky’s particular value is that he was obsessed with exactly the questions that matter most for AI development. What happens when intelligence develops faster than wisdom? What happens when the capacity for reasoning outstrips the capacity for feeling? What happens when small groups of smart people convince themselves they have discovered truths so important that normal constraints no longer apply?

Stavroginism: the human orthogonality thesis

Stavrogin is a character for who moral considerations have become a parlor game. He can analyze everything and follow the threads of moral logic, but is not moved or compelled by them at a level beyond curiosity.

The Stavrogin type can contemplate human extinction as calmly as they contemplate next quarter’s revenue projections. This is not because they have thought more deeply about the question; it is because they lack the normal human response to existential horror. Their equanimity is not wisdom; it is damage.

[...]

They have looked at the abyss so long that they no longer see it. Their equanimity is not strength; it is the absence of appropriate emotional response.

Kirillovan reasoning: reasoning to suicide

Closely related is Kirillov. Whereas Stavrogin is the detached curious observer to long chains of off-the-rails moral reasoning, Kirillov is the true believer.

Yudkowsky has a useful concept he calls “the bottom line”—the idea that in any motivated reasoning process, the conclusion is written first, and the arguments are found afterward. [...]

But there is an opposite failure mode that Yudkowsky’s framework does not adequately address: the person who follows arguments wherever they lead without any check on whether the conclusions make sense. This person is not engaging in motivated reasoning; they are engaging in unmotivated reasoning, deduction without sanity checks. Kirillov is the prototype.

[...]

Kirillov [...] has arrived at the conclusion that suicide is the ultimate act of human freedom, the assertion of human will against the universe that created it. He plans to kill himself as a kind of metaphysical demonstration, and he has agreed to leave a suicide note taking responsibility for crimes committed by Pyotr Stepanovich’s revolutionary cell.

The author compares Kirillov to people who accept Pascal’s wager -type EV calculations about positive singularities. A better example might be the successionists, some of who want humanity to collectively commit suicide as the ultimate act of human moral concern towards future AIs.

Shigalyovism: reasoning to despotism

Shigalyov rises to present his system for organizing society. “I have become entangled in my own data,” he begins, “and my conclusion directly contradicts the original idea from which I started. Starting from unlimited freedom, I end with unlimited despotism. I will add, however, that apart from my solution of the social formula, there is no other.”

[...]

One character asks whether this is not simply a fantasy. Shigalyov replies that it is the inevitable conclusion of any serious attempt to organize society rationally. All other solutions are impossible because they require human nature to be other than it is. Only by eliminating freedom for the many can freedom be preserved for the few, and only the few are capable of handling freedom without destroying themselves and others.

[...]

The company reacts with fascination, horror, and a certain amount of admiration. No one can quite refute the argument. And this is Dostoevsky’s point: the argument cannot be refuted on its own terms because its premises, once accepted, do indeed lead to its conclusions. The error is in the premises, but the premises are hidden behind such a mass of reasoning that they are difficult to locate.

If Stavrogin is the intellectually entranced x-risk spectator & speculator, and Kirillov is the self-destructive whacko, Shigalyov is the political theorist who has rederived absolute despotism and Platonic totalitarianism for the AGI era.

The AI safety community has developed its own versions of Shigalyovism [...] The concept of a “pivotal act” is perhaps the clearest example. [...] The canonical example is using an aligned AI to prevent all other AI development—establishing a kind of permanent monopoly on artificial intelligence.

This is Shigalyovism in digital form. It begins with the desire to protect humanity and ends with a proposal for a single point of failure controlling all future technological development. The reasoning is internally consistent: if unaligned AI would destroy humanity, and if many independent AI projects increase the probability of unaligned AI, then preventing independent AI development reduces existential risk. QED.

But the conclusion is monstrous. A world in which a single entity controls all AI development is a world without meaningful freedom, without the possibility of exit, without any check on the power of whoever controls that entity. It is Shigalyov’s one-tenth ruling over his nine-tenths, with the moral framework of “preventing extinction” replacing the moral framework of “achieving paradise.”

Hollowed institutions

Dostoevsky’s point is not that the revolutionaries are powerful but that the institutions they attack are weak. The provincial society of Demons has no genuine principles, no deep roots, no capacity for self-defense. It exists through inertia and convention. When those conventions are challenged, it collapses almost immediately.

[...]

I have watched equivalent dynamics in AI governance. I have sat in meetings where everyone present knew that a proposed deployment was risky, where no one was willing to be the person who stopped it. The social costs of objection were immediate and certain; the costs of acquiescence were diffuse and probabilistic. Every time, acquiescence won.

Dostoevsky understood that civilizations do not collapse because they are attacked by overwhelming external force. They collapse because their internal coherence decays to the point where even modest pressure can break them. The revolutionaries in Demons are not impressive people; they are provincial mediocrities. They succeed because the society they attack is even more mediocre.

Possession

The possession Dostoevsky describes is not primarily a matter of ideas entering minds from outside. It is a matter of capacities being developed without the corresponding wisdom to use them, of intelligence outrunning conscience, of means being cultivated without attention to ends.

The characters in Demons are not possessed by socialism or liberalism or nihilism as external forces. They are possessed by their own cleverness—by the intoxicating experience of reasoning without limit, of following thoughts wherever they lead, of treating everything as a puzzle to be solved rather than a reality to be encountered.

The AGI uniparty

The AI research community is not a collection of separate tribes; it is a single social organism that happens to be distributed across multiple corporate hosts.

Consider the actual topology. Researcher A at OpenAI dated Researcher B at Anthropic; they met at a house party in the Mission thrown by Researcher C, who left DeepMind last year and now runs a small alignment nonprofit. Researcher D at Google and Researcher E at Meta were roommates in graduate school and still share a group house with three other ML researchers who work at various startups. The safety lead at one major lab and the policy director at another were in the same MIRI summer program in 2017. The CEO of one frontier lab and the chief scientist of another served on the same nonprofit board.

This is not corruption in any conventional sense. It is simply how small, specialized communities work.

[...]

The official story is that the AI labs are competitors. [...] But the social topology undermines this story. When researchers move fluidly between organizations, they carry knowledge, assumptions, and culture with them.

[...]

The result is a kind of uniparty—a shared culture that supercedes corporate affiliation. The uniparty has its own beliefs (that AGI is coming relatively soon, that the current paradigm will scale, that technical alignment work is tractable), its own values (intellectual rigor, effective altruism, cosmopolitan liberalism), its own taboos (excessive pessimism, appeals to regulation, anything that smacks of Luddism). These shared beliefs, values, and taboos operate across organizational boundaries, creating a remarkable homogeneity of outlook among people who are nominally competitors.

[...]

The AI uniparty’s shared premises include: that intelligence is the key variable in the future of civilization; that artificial intelligence will soon exceed human intelligence; that the people currently working on AI are therefore the most important people in history; that their technical and intellectual capabilities qualify them to make decisions for humanity. These premises are rarely stated explicitly, but they structure everything. They explain why the community can tolerate such high levels of risk—because the alternative (letting “less capable” people control the development) seems even worse.

[...]

One cannot believe that AI development should stop entirely. One cannot believe that the risks are so severe that no level of benefit justifies them. One cannot believe that the people currently working on AI are not the right people to be making these decisions. One cannot believe that traditional political processes might be better equipped to govern AI development than the informal governance of the research community.

These positions are not explicitly forbidden. They are simply unthinkable—they would mark one as an outsider, as someone who does not understand, as someone who is not part of the conversation. The boundary is maintained not through coercion but through the subtler mechanisms of social belonging: the raised eyebrow, the awkward silence, the failure to be invited to the next dinner party.

The liberal father as creator of the nihilist son

Liberal Stepan’s son Pyotr Stepanovich is a chief nihilist character in Demons. The author of The Possessed Machines argues this sort of thing—EA altruism turning into either outright nihilism or power-hunger—is a core cultural mechanic. I think they are directionally right but I don’t follow their main example of this, which argues “technology ethics frameworks that are supposed to govern AI—fairness, accountability, transparency, the whole FAccT constellation—are the Stepan Trofimovich liberalism of our moment”, and “the serious people [...] have moved past these frameworks” because they are obsolete. My read of the intellectual history is that AGI-related concerns and galaxy-brained arguments about the future of galaxies preceded that cluste rof more prosaic AI concerns, and they’re different branches on the intellectual tree, rather than successors of each other.

Handcuffed Shatov

Ivan Shatov is a former atheist who has returned to a mystical Russian Orthodoxy, a believer who cannot quite manage belief. He was once a member of Pyotr’s revolutionary circle and now repudiates it, but the circle will not let him go. He is murdered by his former comrades for the crime of wanting to leave.

Shatov represents something important: the person who has come to doubt the project but cannot escape it. Every major AI lab has its Shatovs—researchers who have grown increasingly uncomfortable with the direction of their work but feel trapped by career incentives, social ties, stock options, and the genuine difficulty of imagining alternative paths. Some of them have left. Many more have stayed, hoping to “push from the inside,” rationalizing their continued participation.

Dostoevsky shows us what happens to the Shatovs. They do not reform the movement from within. They are destroyed by it.

The solution is fundamentally spiritual

The ideological debate between liberals and radicals cannot be resolved through more ideology. The social dynamics of provincial conspiracy cannot be fixed through better coordination mechanisms. The psychological deformations of the intelligentsia cannot be healed through more intelligence. Something else is needed—something that operates at a different level, that addresses the human situation rather than any particular doctrine.

I am not a religious person, and I am not advocating for religious solutions to AI risk. But I think Dostoevsky is pointing toward something important: the limits of political and technical approaches to problems that are fundamentally spiritual in nature.

The word “spiritual” is likely to provoke allergic reactions in a rationalist context. Let me try to be precise about what I mean by it. The core problem with AI development is not that we lack good alignment techniques (though we do). It is not that the incentive structures are wrong (though they are). It is not that the governance mechanisms are inadequate (though they are). The core problem is that the people making the key decisions are, many of them, damaged in ways that disqualify them from making these decisions wisely.

This damage is not primarily intellectual. The people I am thinking of are intelligent, often extraordinarily so. It is something more like moral—a failure of the channels that connect knowledge to action, that make abstract truths feel binding, that generate appropriate emotional responses to contemplated harms.