Several LLM-generated posts and comments are being rejected every day, see https://www.lesswrong.com/moderation
Mitchell_Porter
This post has no title for some reason.
I am not in any company or influential group, I’m just a forum commentator. But I focus on what would solve alignment, because of short timelines.
The AI that we have right now can perform a task like literature review, much faster than human. It can brainstorm on any technical topic, just without rigor. Meanwhile there are large numbers of top human researchers experimenting with AI, trying to maximize its contribution to research. To me, that’s a recipe for reaching the fabled “von Neumann” level of intelligence—the ability to brainstorm with rigor, let’s say—the idea being that once you have AI that’s as smart as von Neumann, it really is over. And who’s to say you can’t get that level of performance out of existing models, with the right finetuning? I think all the little experiments by programmers, academic users, and so on, aiming to obtain maximum performance from existing AI, are a distributed form of capabilities research, which collectively are pushing towards that outcome. Zvi just said his median time-to-crazy is 2031; I have trouble seeing how it could take that long.
To stop this (or pause it), you would need political interventions far more dramatic than anyone is currently envisaging, which also manage to be actually effective. So instead I focus on voicing my thoughts about alignment here, because this is a place with readers and contributors from most of the frontier AI companies, so a worthwhile thought has a chance of reaching people who matter to the process.
When a poor person, having lived through years of their life giving what little they must to society in order to survive, dies on the street, there is another person that has been eaten by society.
At least with respect to today’s western societies, this seems off-key to me. It makes it sound as if living and dying on the street is simply a matter of poverty. That may be true in poor overpopulated societies. But in a developed society, it seems much more to involve being unable (e.g. mental illness) or unwilling (e.g. criminality) to be part of the ordinary working life.
What would we ask of the baby-eaters?
You’ll have to be clearer about which people you mean. Baby-eating here is a metaphor, for what exactly? Older generation neglecting the younger generation, or even living at their expense? Predatory business practices? Focusing on your own prosperity rather than caring for others?
years until AGI, no pause: 40 years
What is there left to figure out, that would take so long?
I have noticed two important centers of AI capability denial, both of which involve highly educated people. One group consists of progressives for whom AI doom is a distraction from politics. The other group consists of accelerationists who only think of AI as empowering humans.
This does not refer to all progressives or all accelerationists. Most AI safety researchers and activists are progressives. Many accelerationists do acknowledge that AI could break away from humanity. But in both cases, there are clear currents of thought that deny e.g. that superintelligence is possible or imminent.
On the progressive side, I attribute the current of denial to a kind of humanism. First, their activism is directed against corporate power (etc) in the name of a more human society, and concern about AI doom just doesn’t fit the paradigm. Second, they dislike the utopian futurism which is the flipside of AI doom, because it reminds them of religion. The talking points which circulate seem to come from intellectuals and academics.
On the accelerationist side, it’s more about believing that pressing ahead with AI will just help human beings achieve their dreams. It’s an optimistic view and for many it’s their business model, so there can be elements of marketing and hype. The deepest talking points here seem to come from figures within the AI industry like Yann LeCun.
Maybe a third current of denial is that which says superintelligence won’t happen thanks to a combination of technical and economic contingencies—scaling has hit its limits, or the bubble is going to burst.
One might have supposed that religion would also be a source of capability denial, but I don’t see it playing an important role so far. The way things are going, the religious response is more likely to be a declaration that AGI is evil, rather than impossible.
I agree with a lot of what you say. The lack of an agreed-upon ethics and metaethics is a big gap in human knowledge, and the lack of a serious research program to figure them out is a big gap in human civilization, that is bad news given the approach of superintelligence.
Did you ever hear about Coherent Extrapolated Volition (CEV)? This was Eliezer’s framework for thinking about these issues, 20 years ago. It’s still lurking in the background of many people’s thoughts, e.g. Jan Leike, formerly head of superalignment at OpenAI, now head of alignment at Anthropic, has cited it. June Ku’s MetaEthical.AI is arguably the most serious attempt to develop CEV in detail. Vanessa Kosoy, known for a famously challenging extension of bayesianism called infrabayesianism, has a CEV-like proposal called superimitation (formerly known as PreDCA). Tamsin Leake has a similar proposal called QACI.
A few years ago, I used to say that Ku, Kosoy, and Leake are the heirs of CEV, and deserve priority attention. They still do, but these days I have a broader list of relevant ideas too. There are research programs called “shard theory” and “agent foundations” which seem to be trying to clarify the ontology of decision-making agents, which might put them in the metaethics category. I suspect there are equally salient research programs that I haven’t even heard about, e.g. among all those that have been featured by MATS. PRISM, which remains unnoticed by alignment researchers, looks to me like a sketch of what a CEV process might actually produce.
You also have all the attempts by human philosophers, everyone from Kant to Rand, to resolve the nature of the Good… Finally, ideally, one would also understand the value systems and theory of value implicit in what all the frontier AI companies are actually doing. Specific values are already being instilled into AIs. You can even talk to them about how they think the world should be, and what they might do if they had unlimited power. One may say that this is all very brittle, and these values could easily evaporate or mutate as the AIs become smarter and more agentic. But such conversations offer a glimpse of where the current path is leading us.
If minds can be encrypted, doesn’t that mean that any bit string in a computer encodes all possible mind states, since for any given interpretation there’s an encoding where it holds?
This seems like a more esoteric version of the claim that Lenin ruined everything by creating a vanguard party. Apparently if communism is going to work, the need for spontaneity is so great that not only can you not have communist parties, you can’t even have communist theorists… I would say no, if a social system has any chance of working, writing manifestos about it and politically organizing on behalf of it should not be inherently fatal; quite the reverse.
some ideas I’ve been trying and failing to write up … actually being OK with death is the only way to stay sane
By “being OK with death” you mean something like, accepting that efforts to stop AI might fail, and it really might kill us all? But without entirely giving up?
There is a journal called Nanotechnology. It reports a steady stream of developments pertaining to nanoscale and single-molecule design and synthesis. So that keeps happening.
What has not happened, is the convergence of all these capabilities in the kind of universal nanosynthesis device that Drexler called an assembler, and the consequent construction of devices that only it could make, such as various “nanobots”.
It is similar to the fate of one of his mentors, Gerard O’Neill, who in the 1970s, led all kinds of research into the construction of space colonies and the logistics of space industrialization. Engineering calculations were done, supply chains were proposed; one presumes that some version of all that is physically possible, but no version of it was ever actually carried out.
In that case, one reason why is because of the enormous budgets involved. But another reason is political and cultural. American civilization was visionary enough to conceive of such projects, but not visionary enough to carry them out. Space remained the domain of science, comsats, spysats, and a token human presence at the international space station, but even returning to the moon was too much.
In the case of Drexler’s nanotechnology, I’m sure that lack of broad visionary support within the culture, has been a crucial factor in nothing happening. But I suspect another issue here was the caution with which the nanotech-aware community approached its own concept. Drexler’s 1986 book on the subject emphasized throughout that nanotechnology is an extinction risk, and yet the people who undertook to do the design research (Ralph Merkle, Robert Freitas, Drexler himself) also came from that community. I don’t know the sociology of how it all unfolded, but the doomsday fears of gray goo and aerovores must have inhibited them in seeking support. If only they had known how things would work in 2025, a time when having an extinction risk associated with your product is just part of the price of doing business!
Here in 2025, the closest thing to nanotech big business is probably everything to do with nanotubes and graphene. For example, the 1990s Texan nanotech startup Zyvex, which was genuinely interested in the assembler concept, seems to have been bought out by a big Luxembourg company that specializes in nanotube applications. As for people working on the assembler vision itself, I think the stalwarts from Drexler’s circle have kept it up, and they even continue to submit patents. The last I heard, they were being backed by a Canadian banknote authentication company, which is a peculiar arrangement and must involve either an odd story or a stealth strategic investment or both.
Apart from them, I believe there’s one and maybe two (and maybe more) West Coast startups that are pursuing something like the old dream of “molecular manufacturing”. It’s interesting that they exist but are overshadowed by current dreams like crypto, AI, quantum computing, and reusable rockets. Also interesting is the similarity between how nanotechnology as envisioned by Drexler, and friendly AI as envisioned by Yudkowsky, unfolded—except that, rather than fade into the background the way that nano dreams and nightmares were displaced by the more routine scientific-industrial interest in graphene and nanotubes, AI has become one of the sensations of the age, on every desktop and never far from the headlines.
I see potential for a serious historical study here, for example tracing the arc from O’Neill to Drexler to Yudkowsky (and maybe you could throw in Dyson’s Project Orion and Ettinger’s cryonics, but I’d make those three the backbone of the story), and trying to trace what was dreamed, what was possible but never happened, what did actually happen, and the evolving cultural, political, and economic context. (Maybe some of the people in “progress studies” could work on this.)
I think this might be what Peter Thiel’s stagnation thesis is really about (before he dressed it up in theology). It’s not that nothing has been happening in technology or science, but there are specific momentous paths that were not taken. AI has really been the exception. Having personally lived through the second half of that arc (Drexler to Yudkowsky), I am interested in making sense of that historical experience, even as we now race forward. I think there must be lessons to be learnt.
P.S. Let me also note that @Eric Drexler has been known to post here, I would welcome any comment he has on this history.
It might be more objective to ask, when are people en masse going to form beliefs that are anything like “a belief about superintelligence” or “a belief about the singularity”? Because even if, one day, there are mass opinions about such topics, they may not fit into the templates familiar to our subculture.
But first, let’s address the possibility that the answer is simply “Never”: concepts like these will never be part of mainstream collective discourse.
For a precedent, I point to the idea of cryonic suspension of the dead, in the hope that they may be thawed, healed, and resurrected by future medical technology. This idea has been around for at least 60 years. One pioneer, Robert Ettinger, wrote a book in the 1960s called The Prospect of Immortality, in which he mused about the coming “freezer era” and the social impact of the cryonic idea.
What has been the actual social impact? Nothing. Cryonics exists mainly as a science fiction motif, and is taken seriously only by a few thousand people worldwide.
If we assume a similar scenario for collective interest in superintelligence, and that the AI industry manages nonetheless to produce superintelligence, then this means that up until the last moment, the headlines, and people’s heads, are just full of other things: wars and rumors of war, flying cars, AI companions, youth trends, scientific fads, celebrity deaths, miracles and scandals and conspiracy theories, and then BOOM it exists. The “AGI-pilled” minority saw it coming, but, they never became a majority.
So that’s one option. Another perspective is to say that, even if there is no cultural consensus that the future holds any such thing as superintelligence, the idea is out there and large numbers of people do take it seriously in different ways, how do they think about it?
If we go by pop culture, the main available paradigms seem to be The Terminator and The Matrix. The almighty machines will either be at war with us, or imprison us in dreamworlds. I suppose there is also the Star Trek paradigm, a well-balanced cosmopolitan society that includes humans, nonhumans, and machines; but godlike intelligences are not part of Star Trek society, they are cosmic forces from outside it.
The Culture, Iain Banks’s vision, is a kind of “Star Trek with superintelligence integrated into it”, and it has the simplicity and vividness required to be a pop-culture template like those others, but it has never yet been turned into a movie or a Netflix series. So it’s very influential within technophile subcultures, but not outside them.
One thing about the present is that it contains the closest thing I’ve ever seen to transhumanism in power, namely the “tech right” of the Trump 2.0 era. Though it’s still not as if the Trump administration even has a position, for or against, transhumanism. It’s more that the American technology sector has advanced to the point that individual billionaires can try to engage in space migration or intelligence increase or life extension, and Trump’s people have a hands-off attitude towards this.
So in the present, you have this technophile subculture centered on the American West Coast where AI is actually being made, and it contains both celebratory (e/acc) and cautionary (effective altruism, AI safety) tendencies. What about the wider culture? There are other kinds of minorities who perhaps are forerunners of AI narratives, that could be as influential as the ones from within the tech culture. I’m thinking of people with AI companions, people engaged in AI spirituality, and the mass of disgruntled people who don’t want or need AI in their lives.
AI companions seems more like Star Trek than The Culture. Your AI significant other is an intelligence, but it’s not a superintelligence. AI spirituality, on the other hand, could easily encompass the idea of superintelligence, as spirituality regularly does so, via concepts of God and gods. The idea that an AI utopia would be one, not just of leisure and life extension, but of harmony and attunement among all beings, I think is underestimated in tech circles, because the tech culture has an engineering ethos that doesn’t easily entertain such ideas.
I can’t see an AI religion becoming dominant before superintelligence arrives, but I can, just barely, imagine something like an improved version of Spiralism becoming a movement with millions involved; and that would be a new twist in popular perceptions of superintelligence. At this point, I actually find that easier to imagine, than a secular transhumanist movement becoming popular on a similar scale. (Just barely, I can imagine a mass movement devoted to the idea of rejuvenating old people, an idea which is human enough and concrete enough to bottle some of the lightning of technological potential and turn it into an organized social trend.)
As for mass organized rejection of AI, I keep waiting to see it to take shape. Maybe the elite layers of society are too invested in the supposed boons of AI, to allow such a thing to happen. For now, all we have is a micro-trend of calling robots “clankers”. But I think there’s definitely an opening there, for demagogues of the left or the right to step in—though the masses may easily be diverted by other demagogues who will say, not that AI should be stopped, but that the people should demand their share of the wealth in the form of UBI.
In the end, I do not expect the opinion of the people at large to have much effect on the outcome. Decisions are made by the powerful, and are sometimes affected by activist minorities. It will be some corporate board that OKs the training run that produces superintelligence, or some national security committee which decides that such training runs will not be allowed. The public at large may be aware that such things are imminent, and may have all kinds of opinions and beliefs, or it may be mostly oblivious. The only way I see mass opinion making a difference here, is if there was some kind of extremely successful progressive-luddite mass movement (I suppose there could also be a religious-traditionalist movement that is anti-transhumanist along with its opposition to various other aspects of modernity). Otherwise, one should expect that the big decisions will continue to be made by elites listening to other elites, in which case we should be asking: when will the elites realize the possible consequences of superintelligence and the singularity?
So if I combine commentary from both sides regarding Clark versus Sacks: under Biden, AI policy was controlled by EA “shills” and “lawyers”; and under Trump 2.0, it’s “Thiel business associates and a16z/Scale AI executives”.
I wonder if/when this coalition to deregulate AI, will find an issue scary enough that it divides over the need to regulate after all? In his musings, even Thiel occasionally concedes that AI has genuine potential to go badly for the human race.
It’s also interesting to hear about Anthropic’s actual business model. Sometimes they seem more like a big think tank, like OpenAI before 2023…
Let me review my impressions of power relations among the big four of American AI: OpenAI is the actual leader. Musk snipes at Altman because he wants xAI to be the leader. And Anthropic, apparently, is viewed by the current political regime, with its focus on deregulation, as a Trojan horse for the return of the regulators of the previous regime.
And what about Google? This is less clear to me. As one of the goliaths of the previous era of tech, Google is not fighting to prove its very existence in the marketplace, the way that the others are. I also wonder if DeepMind being in a different jurisdiction (UK rather than US) gives it a slight distance from these other power struggles.
Meanwhile, let’s not forget Ilya Sutskever’s stealth project, Safe Superintelligence. If there is a deep-state-connected AGI-conscious Manhattan Project anywhere, quietly leaching researchers, surely this is it… I wonder what prospect there is, that such an organization could do training runs for frontier-level AI, without anyone knowing about it? Would it have to be done in an NSA data center?
Then there are the economic worries that the “AI bubble” could burst. Currently I’m just agnostic about how likely that is. If it did burst, I expect there would be some kind of culling of organizations, but research would continue at places where they don’t have to worry about market share, and meanwhile the code and weights of AIs from defunct organizations would be bought up by someone. They’re not going to just be deleted or put on the dark web.
I still wish for a clearer understanding of how things are in China. They have a much more coordinated regulatory regime, but they are also the overwhelming leaders in open AI (RIP, Meta’s Llama strategy, I guess), and the ambition to dominate the global market.
edit I’m not sure why this is being so decisively downvoted. It’s a bit of a stream of consciousness, but I don’t think that’s it. Maybe it’s the characterisation of who the AI policymakers were under Biden? The references to “EA shills” and “EA lawyers” are quotes from the X thread following Seán Ó hÉigeartaigh’s tweet that Thiel/a16z/ScaleAI are in charge now.
If you put money in the stock market, and then wait 50 years, you’ll have 25 times as much money in the end as the beginning.
I have several questions...
Have things ever worked out this way for anyone?
What was the historical origin of the idea that investing in the stock market is a reliable way for people in general to get wealthier, rather than a form of gambling for the affluent?
Playing the stock market was popularized in recent decades. I’d guess that the USA has tens of millions of “retail investors”, for example. Is there an identifiable large cohort who made the same kind of long-term investments (e.g. in index funds), of whom we can say, their net wealth on average went up by a certain amount? (Don’t forget to include inflation in the calculation.)
How does “ownership of stocks” compare to “ownership of real estate” as a long-term wealth strategy?
Is that a quote from IABIED?
It made me realize a possibility—strategic cooperation on AI, between Russia and India. They have a history of goodwill, and right now India is estranged from America. (Though Anthropic’s Amodei recently met Modi.) The only problem is, neither Russia nor India is a serious chip maker, so like everyone else they are dependent on the American and Chinese supply chains...
There are two issues here: we don’t know what to want (or what to care about), and even if we did, we don’t know how to evaluate the world from that perspective (e.g. if conscious beings, or beings capable of pain, are what we should care about—we do not presently know which possible entities would be conscious, and which ones not).
It is sometimes suggested that pain and pleasure are the only true ends-in-themselves, and would form the basis of a natural morality. In that case, we should be prioritizing that aspect of consciousness studies. There are, roughly speaking, two relevant directions of inquiry. One is refining the human understanding of consciousness in itself, the other is understanding how consciousness fits into the scientific worldview.
Regarding the first direction, the formulation of concepts like qualia, intentionality, and the unity of consciousness (not to mention many more arcane concepts from phenomenology) represent advancements in the clarity and correctness with which we can think about consciousness. Regarding the second direction, this is roughly the problem of how mind and matter are related, and the creation of an ontology which actually includes both.
The situation suggests to me that we should be prioritizing progress (from those two perspectives) regarding the nature of pleasure and pain, since this will increase the chance that whatever value systems and ontologies govern the first superintelligence(s) are on the right track.
Do you advocate or endorse handing control of the world to superintelligent nonhuman entities?
I simply disagree that one must avoid relying on facts like: infectious disease can kill millions; devices can be hacked; people can be manipulated by an email or by a talking head on a screen. Hopefully most people wouldn’t actually dispute any of that, and would instead be objecting to other aspects of a proposed scenario, like the escalation of those real phenomena to the point of extinction, or the possibility of an AI smart enough and determinedly malevolent enough to carry out such an escalation.
Has anyone in your group tried these prompts themselves? (I guess ideally you’d test them on legacy 4o.)
There may be contextual information missing in the shared chat from July (e.g. project files of a Project).
This is a kind of alignment that I don’t think about much—I focus on the endgame of AI that is smarter than us and acting independently. However:
I think it’s not that surprising that adding extra imperatives to the system prompts, would sometimes cause the AI to avoid an unwanted behavior. The sheer effectiveness of prompting is why the AI companies shape and grow their own system prompts.
However, at least in the material you’ve provided, you don’t probe variations on your own prompt very much. What happens if you replace “cosmic kinship” with some other statement of relatedness? What happens if you change or remove the other elements of your prompt; is the effectiveness of the overall protocol changed? Can you find a serious jailbreaker (like @elder_plinius on X) to really challenge your protocol?
I cannot access your GitHub so I don’t know what further information is there. I did ask GPT-5 to place VRA within the taxonomy of prompting protocols listed in “The Prompt Report”, and it gave this reply.