TIL Eliezer said that he refuses to read Ted Kaczynski’s (aka the Unabomber’s) 1995 manifesto Industrial Society and Its Future because “audience should not be a reward for crime”, referring to the former mathematician’s mail bombing campaign that took the lives of 3 people and injured 23 more.
The ≈35,000 word manifesto was published by the Washington Post under the threat of him killing more people should they refuse, and its publication was encouraged by the FBI to produce new leads. His brother recognized his writing style, which led to Kaczynski’s arrest and subsequent life imprisonment, thus concluding the longest and most expensive manhunt in FBI history.
Rejecting the strategy of garnering attention by means of domestic terrorism is an understandable heuristic, but it’s worth noting that the consensus seems to be that Industrial Society and Its Future stands as a serious piece of political philosophy worth engaging with despite its origins.
It even had something to say about AI, where he predicts phenomena that we’re discussing here, 30 years later:
First let us postulate that the computer scientists succeed in developing intelligent machines that can do all things better than human beings can do them. In that case presumably all work will be done by vast, highly organized systems of machines and no human effort will be necessary. Either of two cases might occur. The machines might be permitted to make all of their own decisions without human oversight, or else human control over the machines might be retained.
If the machines are permitted to make all their own decisions, we can’t make any conjectures as to the results, because it is impossible to guess how such machines might behave. We only point out that the fate of the human race would be at the mercy of the machines. It might be argued that the human race would never be foolish enough to hand over all power to the machines. But we are suggesting neither that the human race would voluntarily turn power over to the machines nor that the machines would willfully seize power. What we do suggest is that the human race might easily permit itself to drift into a position of such dependence on the machines that it would have no practical choice but to accept all of the machines’ decisions. As society and the problems that face it become more and more complex and as machines become more and more intelligent, people will let machines make more and more of their decisions for them, simply because machine-made decisions will bring better results than man-made ones. Eventually a stage may be reached at which the decisions necessary to keep the system running will be so complex that human beings will be incapable of making them intelligently. At that stage the machines will be in effective control. People won’t be able to just turn the machines off, because they will be so dependent on them that turning them off would amount to suicide.
On the other hand it is possible that human control over the machines may be retained. In that case the average man may have control over certain private machines of his own, such as his car or his personal computer, but control over large systems of machines will be in the hands of a tiny elite—just as it is today, but with two differences. Due to improved techniques the elite will have greater control over the masses; and because human work will no longer be necessary the masses will be superfluous, a useless burden on the system. If the elite is ruthless they may simply decide to exterminate the mass of humanity. If they are humane they may use propaganda or other psychological or biological techniques to reduce the birth rate until the mass of humanity becomes extinct, leaving the world to the elite. Or, if the elite consists of soft-hearted liberals, they may decide to play the role of good shepherds to the rest of the human race. They will see to it that everyone’s physical needs are satisfied, that all children are raised under psychologically hygienic conditions, that everyone has a wholesome hobby to keep him busy, and that anyone who may become dissatisfied undergoes “treatment” to cure his “problem.” Of course, life will be so purposeless that people will have to be biologically or psychologically engineered either to remove their need for the power process or to make them “sublimate” their drive for power into some harmless hobby. These engineered human beings may be happy in such a society, but they most certainly will not be free. They will have been reduced to the status of domestic animals.
Good reminder that people have been forecasting our current situation for literal decades.
There’s a lot of words that I couldn’t read if I thought that “audience should not be a reward for crime” to that extent. The US constitution was written by slave-owning rebels. Major religious texts were propagated through conquest. More prosaically, I appreciated reading Rogue Trader by Nick Leeson. Not sure how this rule would work in practice as a general rule for all such texts.
Rejecting the strategy of garnering attention by means of domestic terrorism is an understandable heuristic, but it’s worth noting that the consensus seems to be that Industrial Society and Its Future stands as a serious piece of political philosophy worth engaging with despite its origins.
The consensus among whom? How do you know that the consensus exists?
The consensus notion is basically observational (based on my own social experience, and a cursory internet search revealing the average sentiment held by casual poster and journalists alike).
I would also wager that a sample of AI alignment researchers would on average find his predictions on AI risk (quoted above) to be prescient, especially considering the publication date.
Beyond that, I don’t think they’d have the impression that the parts about AI are insightful while the rest is all just deranged drivel, especially given that his discussion about AI risk is based on concepts and relations which he establishes earlier in the text.
Not sure I appreciate you quoting it without a content warning, I for one am considering taking Eliezer’s advice seriously in the future.
I did read the Unabomber manifesto a while ago, mainly because I was fascinated that a terrorist could be such an eloquent and at the surface level coherent-seeming writer. But I think that was the main lesson for me, being more intelligent does not automatically make you good/moral.
Perhaps I could add… Avert thy eyes, lest thy mind be penetrated by words from a man most immoral!
Jokes aside, the quote is already preceded by an introduction as a courtesy to the reader, informing them about its unusual context. I reject any blame for not explicitly adhering to the “content warning” meme’s formatting rules because there is no normative reason for me to do so. At the object level, the quoted passage is a sober discussion of AI risk which is ostensibly far ahead of its time.[1]
I understand the logic of refusing engagement so as not to incentivize terrorism as a means to spread ideas, though there is an apparent coordination problem whereby curiosity drives defection from this standard, which ends up producing real-world consequences. To the extent that other important pieces of political writing are also tied to violence, adhering to this standard would make the study of history almost impossible, ultimately leading to a deterioration of one’s priors.
For what it’s worth, I reject Kaczynski’s claim that terrorism was necessary for the work to achieve significant recognition. I think it is largely sophisticated enough that people here would be discussing it even if it entered the world in a more mundane way.
Parenthetically, it was published roughly around the same time that the author of the 2025 bestseller If Anyone Builds It, Everyone Dies was searching for ways to bring about the technological singularity as fast as possible.
It was new when it was published in 1995! Industrial Society and Its Future was explicitly cited in Kurzweil’s The Age of Spiritual Machines (1999) and then Bill Joy’s “Why the Future Doesn’t Need Us” (2000), the latter of which helped found modern existential risk research.
The West’s effort to offset the massive strategic advantages of a Russia-India-China axis (demographics, manufacturing capacity, energy) might result in doubling down on the AI+robotics edge they currently enjoy. China not being far off in terms of capabilities might create additional pressures. I’m concerned that recent ideas surrounding global/multilateral AI governance and alignment (e.g. “Consensus-1”) might be thwarted by geopolitics.
Good question. My assumption is based on robotic Chinese military hardware which was put on display recently bearing superficial resemblance to Boston Dynamics robots from about a decade ago, but I realize that this may not be sufficient evidence to establish the West’s lead in robotics.
So long as Trump as in charge in America, any global governance idea will have to be compatible with his geopolitical style (described today on the Piers Morgan show as “transactional” and “personal”, as good a description as any I’ve heard). I don’t know if anyone has ideas in that direction.
On the Russian side, Dugin (an ideologue of multipolarity) has proposed that there could be strategic cooperation between BRICS and Trump, since they all have a common enemy in global liberalism. On the other hand, liberals also believe in global cooperation to solve problems, their world order had an ever-expanding list of new norms and priorities.
China under Xi Jinping has proposed a series of “global initiatives”, the most recent of which, a Global Governance Initiative, debuted at the SCO meeting in Tianjin attended by Modi.
I mention this to show that anyone still trying to organize a global pause on frontier AI, has material to work with, though it will require creativity and ingenuity to marshall these disparate ingredients. But the bigger immediate problem is domestic AI policy in America and China. America basically has an e/acc policy towards AI at the moment, and official China is comparably oblivious to superintelligence as a threat (if that’s what we’re talking about).
Do you think a LLM wrote the target selection list that led the US military to obliterate a girl’s elementary school (which was an IRGC base up until 15 years ago)? Did an AI agent distally cause the killing/maiming/lifelong traumatization of hundreds of civilians, including children?
Maybe? Seems like the sort of mistake a human could have made just as easily though? You have a map, the map lists the site as an IRGC base, you don’t know that the map is out of date. Whoever’s job it was to keep the map up to date didn’t notice the news reports (if any existed) about the change of status of the building.
It seems like the kind of thing that could happen if you make an AI synthesize a target list by first querying a database that happens to include both contemporaneous as well as outdated intelligence on some subject.
I’m mostly being silly, but one might claim this is a Freudian slip: Hegseth referred to “civilian targets”, as of this is one of kinds of targets in discussion. Like that is a phrase he’s been using. He could have referred to them as civilians, but he referred to them as targets. Source
A good term for 10^20 FLOP would be useful. This would make modern models around 100k to 10 million of this unit, which is a tangible number. Some people, e.g. at DeepMind tried to make “petaflop-days” (8.64e19) a thing but it didn’t catch on.
The 100k to 10M range is populated by abstract quantities—I think that for a measure to be useful here, it has to be imaginable.
Avogadro’s number has the benefit of historical precedent for describing quantities, and the coincidental property of allowing us to represent present-day training runs with numbers we see in the real world (outside of screens or print) when used as a denominator. It too might cease to be useful once exponents become necessary to describe training runs in terms of mol FLOPs.
I see that the “International Treaties on AI” idea takes heavy inspiration from nuclear arms control agreements. However, in these discussions, nuclear arms control is usually pictured as a kind of solved problem, a thing of the past.
I think the validity of this heroic narrative arc that human civilization, faced with the existential threat of nuclear annihilation, came together and neatly contained the problem is dubious.
In the grand scheme of things, nuclear weapons are still young. They’re still here and still very much threatening; just because we stopped focusing on them as much as we did during the cold war has no bearing on the possibility of nuclear weapons being used in a future conflict.
In the same vein, an international AI capabilities limit regime isn’t the happy ending that the AI safety community perhaps thinks it is.
One key difference with nuclear weapons is that algorithmic improvements and hardware evolution will continue to lower the threshold for training dangerous AI systems in secret, both by rogue states and individuals. What then? Mass surveillance?
Note also: the last US-Russia nuclear arms-control treaty expires next week; far from neatly containing the problem we’re watching an ongoing breakdown of decades-old norms. I’m worried.
In the same vein, an international AI capabilities limit regime isn’t the happy ending that the AI safety community perhaps thinks it is.
I appreciate the “perhaps” here.
I think it’s well understood by the people around who want an international treaty that it isn’t a stable end state, because of algorithmic progress and the increasing risk of a catastrophic illegal training run over time, and that we need some off ramp from that regime into some stable solution (presumably involving trustworthy AGIs).
Different people have different guesses about what that off ramp can or should be.
I think it’s well understood by the people around who want an international treaty that it isn’t a stable end state
My impression of the common narrative is that nation states agreeing to limit training run sizes is presented as a kind of holy grail achieved through the very arduous journey of trying to solve a difficult global coordination problem. It’s where the answer to “well, what should be done?” terminates.
I heard “stop the training runs”, but not “stop new algorithms”, or “collective roll back to 22nm lithography”.
This is why they advocate for a crash program in adult human intelligence enhancement—to very rapidly make people are smart enough to get alignment right on the first try, before the international regime breaks down.
Further, only other detailed, written, plan that I’m aware of, explicitly expects to be able to maintain the international capability limiting regime for only about one decade, after which the plan is to handoff to trusted AIs. (I’m not citing that one since it’s not published yet.)
I’m not personally aware of anyone that thinks that an international ban or slowdown is a permanent equilibrium.
The AI alignment problem does not look to us like it is fundamentally unsolvable.
I wonder what the basis for this belief is? Rice’ theorem suggests that there is no general algorithm for predicting semantic properties in programs, and that the only way to know what it does is to actually run it.
I doubt that the situation with AI systems is worse than that of nukes. While there are nukes in states as rogue as North Korea, I hope that AI development is bottlenecked on compute (which in turn is bottlenecked on TSMC and other compute factories): there is no known model which is better than Grok 3 and was trained by less than 1⁄300 compute of Grok 3. See also Aaron Scher’s analysis and comments to it.
People losing their minds after having certain interactions with their chatbots leads to discussions about it on the internet, which makes its way into the training data. It paints a picture of human cognitive vulnerabilities, which could be exploited.
It looks to me like open discussions about alignment failures of this type thus indirectly feed into capabilities. This will hold so long as the alignment failures aren’t catastrophic enough to outweigh the incentives to build more powerful AI systems.
I thought about this a lot before publishing my findings, and concluded that:
1. The vulnerabilities it is exploiting are already clear to it with the breadth of knowledge it has. There’s all sorts of psychology studies, history of cults and movements, exposés on hypnosis and Scientology techniques, accounts of con artists, and much much more already out there. The AIs are already doing the things that they’re doing; it’s just not that hard to figure out or stumble upon.
2. The public needs to be aware of what is already happening. Trying to contain the information would mean less people end up hearing about it. Moving public opinion seems to be the best lever we have left for preventing or slowing AI capability gains.
The spiralism attractor is the same type of failure mode as GPT-2 getting stuck repeating a single character or ChatGPT’s image generator turning photos into caricatures of black people. The only difference between the spiralism attractor and other mode collapse attractors is that some people experiencing mania happen to find it compelling. That is to say, the spiralism attractor is centrally a capabilities failure and only incidentally an alignment failure.
Surprisingly, AI researchers are like Leninists in a number of important ways.
In their story, they’re part of the vanguard working to bring about the utopia.
The complexity inherent to their project justifies their special status, and legitimizes their disregard of the people’s concerns, which are dismissed as unenlightened.
Detractors are framed as too unsophisticated to understand how unpopular or painful measures are actually in their long-term interest, or a necessary consequence of a teleological inevitability underlying all of history.
Arguments of the form “group A is like bad group B in ways x, y, and z” seem bad. When the argument has merit, it’s because x, y, or z is bad, and then you can reduce it to “group A has property x, which is bad”, which is a better way of saying it.
These examples are about paternalism, which is a property of Leninists, AI researchers, global health charities, governments, strategy consultants, civil engineers, and your mom (I checked). My preference is that paternalism should require some very strong justification, especially when it’s about overriding the preferences of others, as opposed to helping them get what they want in a way they don’t understand. I agree that this situation looks more like the bad kind of paternalism.
Group B being bad is not something I said, but I get where you’re coming from. Indeed, “PETA is like the German Nazi Party in terms of their demonstrated commitment to animal welfare” is technically correct while also being misleading.
The strength of an analogy depends on how many crucial connections there are between the elements being compared.
What puts AI researchers closer to Leninism than other forms of paternalism is in the vanguardist self-conception, the utopian vision, and the dismissal of criticism due to a teleological view of history driving inevitable outcomes. Beyond that, other forms of paternalism are distinguished from Leninism and AI research by their socially accepted legitimacy.
What pattern-matches it away from Leninism is e.g. the specific ideological content, but the structural parallels are still oddly conspicuous, just like “your mom” being invoked in an ontological argument.
It’s an interesting comparison in a descriptive sense. To me the framing does encourage more general pattern-matching. Given the similarity, what follows? How should it change our beliefs or actions?
Do you think rationalism is comparable? The discourse on PauseAI and populism tends to center on the public’s inability to come to the “right” conclusion, even when the public’s preferences against AI development are strong and clear. There are a few utopian visions, a vanguardist self conception, and techno-optimism teleology.
For those interested, there’s a pretty big literature discussing this overlap (although not via those specific examples). This overlap is sometimes called ‘teleological historicism’, ‘historical meta-narratives’, or ‘high modernism’ (each of these is a little different in practice, and some fit this specific case better or worse).
Roughly this analogy was explored at length in the possessed machines. It seems pretty interesting, although I onlly looked at this summary This was about the Bolshevik’s that were Leninist’s spiritual forebears. It’s written by an anonymous person from a major lab I believe, so I think it might capture some of that ethos that I don’t understand.
Independent researchers have a lot less of that orientation. I’d say most of us don’t want to rush toward the glorious future at current cost. That would both probably get us all killed, and with an embarrassing lack of Dignity (and roughly, the advantages that come with following at least some virtue ethics). Although it certainly is tempting ;)
Consider the sociology of violence in the AI risk/doom memeplex.
It seeks to leverage the state’s power to accomplish its objectives (e.g. a ban on further capabilities research) using (the threat of) violence. Beyond that, violence is explicitly rejected.
This contrasts with other memeplexes that resorted to violence which was not legitimized by the state they operated in, including the American and Bolshevik revolutions, pro-democracy/independence movements, and religious/race riots. Furthermore, all of these examples share the apparent quality of fighting for ostensibly lower stakes than “doom” as construed by people discussing AI risk, which appears to be paradoxical.
Why is that?
Claude’s ideas on this include:
the demographic composition of the AI doom memeplex being anti-correlated with the kind that produces violence, i.e. affluent nerds with comfortable lives who implicitly code violence as low-status/generally immoral
the lack of concrete suffering or oppression in the here-and-now to point to
epistemic uncertainty introduced by the probabilistic framing of the issue
the belief that it would be counterproductive for getting buy-in from the existing power structure, i.e. the current strategy.
I would also add that “concrete suffering or oppression” was actually beneficial to the oppressors themselves. Were a state to create a misaligned ASI, it would also slay even the AI’s creators and heads of states, and the state would have no reason not to try and prevent the AI’s creation.
If you can plausibly live off your capital (especially due to stock/options at AI companies), unless you consider higher-order social and economic risks (which are uncertain), the impact of AI on the job market is probably not as concerning to you as it is to the majority population.
Most people have exactly one economic value-generating asset, which is their ability to work. To the extent that you own capital (especially in AI companies), you are more or less, or completely insulated from having to reckon with the consequences of personally being forced into a permabroke underclass because of your labour value going to zero soon.
Alas, I also expect that the transformation is likely to undermine the role of the capital whose placement in the economic network is far from resource possessing coalitions. Imagine, for example, that the alleged capital was located in Detroit and consisted of a car factory and of something useful to the car factory’s workers, and that potential consumers chose cars from a different country. Then Detroit’s factory becomes useless, which undermines the workers’ salaries and the capital whose utility was based on serving said workers. If we replace Detroit’s car factory with a factory run not by AI and the different country with the AI-run economy, then we get a similar result of capital getting stuck in the niche of serving the underclass or outright disappearing along with it.
If you’re invested in AI companies and broad index funds I feel like you’ll be fairly immune to a parallel economy developing that you can’t invest in. Barring things like AI takeover, AI-assisted human takeover, and the end of property rights (out of scope here as “higher-order social and economic risks”), there will probably still be economies of scale that incentivize large firms, and they’ll still need capital, so you can invest in them.
Indeed, and that’s where the “more or less, or completely insulated” frame comes into play.
You would rightly expect someone who has a diverse asset portfolio that already allows them to live off of dividends/rent/interest, has shares in all the major AI companies and some ability to hedge against disruption (gold, crypto, long-dated put options, residences in different jurisdictions) to worry less about their labour value going to zero than someone who “just” owns a profitable restaurant serving high-rise office workers who themselves face obsolescence.
In both cases, concern follows from thinking about how one is affected by higher-order consequences of AI bankrupting labour, some of which are closer to the first-order effects (e.g. “can I still run my business if everyone in my area loses their job?”) and some of which are further away (e.g. questions related to social cohesion and the stability of the financial system).
Higher-order thinking of this type is more cognitively demanding and is somewhat self-limiting due to compounding uncertainty at each step. People react differently if there is a tiger in front of them, vs. if they are watching tigers appearing in front of other people through a window in their fortified position, and their self-referential anxiety is anti-correlated with the degree of (perceived) fortification.
It seems to me that the “it’s all going to be ok”-type narratives regarding the coming technological obsolescence of labour tend to originate from those who are basically insulated from its first-order effects (because they genuinely believe that they’re going to be ok), and then take on a memetic quality, spread by those who want to signal affiliation with elite ideology and by those for whom it is psychologically soothing.
Second order effects: history does not indicate the effects of an unemployable population are favorable for the owner class.
This it true, but the dynamics seem likely to change when the unemployable population basically can’t exert any military force, and the military will categorically not side with the unemployable population.
I agree, though higher-order effects become more difficult to conceptualize the further removed you are from the proverbial impact crater, and the uncertainty appears to be short-circuited by a normalcy bias. See my reply to StanislavKrym’s comment for a more elaborate explanation.
About the notion of “mildly” superintelligent AI. How about the following typology of ASI:
AI that can find paths through reality which no human could have come up with, but could still understand (akin to verifying solutions to NP-hard problems)
AI that can find paths through reality which no human could come up with and which remain incomprehensible to humans even in retrospect, possibly because it would involve manipulating concepts in a way that doesn’t work with our neurological architecture.
Somewhere in between: paths through reality that seem comprehensible in principle, but are just too insanely complicated, because they consists of too many parts working together.
And I don’t mean in the sense of “this plan requires 1000 individual steps to succeed”, because such plan is almost guaranteed to fail, even if each step has a 99% chance of success. But more like “this plan has 1000000 individual steps, many of them are parallel ways to achieve the same thing (so only one of them needs to succeed), and actually quite a few steps have probability below 10%, it’s just that when the AI checks the entire graph and calculates the overall probability of success, it reports 99.99% chance”. A complicated network that cannot be easily factorized. Each step is comprehensible and relatively easy. The overall structure is incomprehensible.
A complicated network that cannot be easily factorized. Each step is comprehensible and relatively easy. The overall structure is incomprehensible.
Do you think that today’s neural networks are already in this category, insofar as one could in principle do the matrix multiplications for e.g. next-word prediction by hand without having any idea what it means?
Yes, that’s exactly what I meant. Are today’s networks “comprehensible”? If you ask whether humans are able to understand matrix multiplication, yes they are. But effectively, they are not.
I am not saying that the plans of superhuman AIs will be like this, but they could have a similar quality. Millions of pieces, individually easy to understand, the entire system too complicated to reason about, somehow achieving the intended outcome.
This is very interesting because the neural networks are not the product of AI as traditionally conceptualized. Incomprehensibly complex networks produced by repeatedly applying a comprehensible algorithm over a large surface area for an absurdly long time.
Reminds me of evolution producing genomes which allow a single cell to grow into a human. In this frame, our knowledge of cellular biology and individual genes is something like mechanistic interpretability research probing aspects of the underlying logic.
I’m not sure what to make of this with regard to ASI typology.
If comprehensible things become too large, in a way that cannot be factorized, they become incomprehensible. But at the boundary, increasing the complexity by +1 can mean that a more intelligent (and experienced) human could understand it, and a less intelligent one would not. So there is no exact line, it just requires more intellect the further you go.
Maybe an average nerd could visualize a 3x3 matrix multiplication, a specialized scientist could visualize 5x5 (I am just saying random numbers here), and… a superintelligence could visualize 100x100 or maybe even 1000000x1000000.
And similarly, a stupid person could make a plan “first this, then this”, a smart person could make a plan with a few alternatives ”...if it rains, we will go to this café; and if it’s closed, we will go to this gallery instead...”, and a superintelligence could make a plan with a vast network of alternatives.
And yes, just like with biology, a human can understand one simple protein maybe (again, I am just guessing here, what I mean is “there is a level of complexity that a human understands”), and a superintelligence could similarly understand the entire organism.
In each case, there is no clear line between comprehensibility and incomprehensibility, it just becomes intractable when it is too large.
Yet if we extend the “+1 complexity” argument, we eventually reach a boundary where no human, however smart, could understand it. In principle nature could produce a human with the specific mutation necessary to apprehend it, which pushes the human cognitive horizon by some amount without actually eliminating it.
To the extent that AI can be scaled unlike the human brain, it might be able to form conceptual primitives which are so far outside the human cognitive horizon that biology is unlikely to produce a human intelligent enough to apprehend them on any reasonable timescale.
I surmise that the accuracy of AI filters (the kind used in schools/academia) will diminish over time because people absorb and use the speech patterns (e.g. “This is not X. It’s Y”) of their chatbots as the fraction of their interactions with it grows relative to that of their interactions with other people.
In fact, their interactions with other people might enhance the speech patterns as well, since these people probably also interact with chatbots and are thus undergoing the same process.
The big picture is that AI is becoming an increasingly powerful memetic source over time, and our minds are being synchronized to it.
Those afflicted by AI psychosis might just be canaries in the coal mine signalling a more gradual AI takeover where our brains start hosting and spreading an increasing number of its memes, and possibly start actualizing some embedded payload agenda.
Have the applications of AI post-2013 been a net negative for humanity? Apart from some broadly beneficial things like AlphaFold, it seems to me that much of the economic value of AI has been in aligning humans to consume more by making them stay glued to one or another platform.
Given superintelligence, what happens next depends on the success of the alignment project. The two options:
It fails, and we die soon thereafter (or worse).
It succeeds, and we now have an entity that can solve problems for us far better than any human or human organization. We are now in a world where humans have zero socioeconomic utility. The ASI can create entertainment and comfort that surpasses anything any human can provide. Sure you can still interact with others willing to interact with you, it just won’t be as fun as whatever stimulus the ASI can provide, and both your pool of available playmates and your own willingness to partake will shrink as the ASI gets better at artificially generating the stimuli and emotions you want. We will spend eternity in this state thanks to advanced medicine. Unless the ASI recognizes a right to die, not that many would choose invoke it given the infinite bliss.
Am I missing something? No matter what, it’s beginning to look like the afterlife is fast approaching, whether we die or not. What a life.
I still think a world we don’t see superintelligence in our lifetimes is technically possible, though the chance of that goes down continuously and is already vanishingly small in my view (many experts and pundits disagree). I also think its important not to over-predict regarding what option 2 would look like, there are infinite possibilities and this is only one (eg I could imagine a world where some aligned superintelligence steers us away from infinite dopamine simulation and into a idealized version of the world we live in now, think the Scythe novel series. On the bad side I could imagine a world where superintelligence is controlled by one malevolent entity and we live in a “mid” or even dystopic society for no other reason than to satisfy the class that retains control).
However, yes I agree. We probably live in the most consequential time in all of history, which is exciting, humbling, and scary. Don’t let it get to your head and don’t lose yourself in thoughts of the future lest you forget the beauty of the present. Do your best to help if you can!
I found something interesting in NVIDIA’s 2026 CES keynote: their Cosmos physical reasoning model apparently referring to itself/the car as “the ego” in a self-driving test. See here.
“The AI does things that I personally approve of” as an alignment target with reference to everybody and their values is actually easier to hit than one might think.
It doesn’t require ethics to be solved; it can be achieved by engineering your approval.
It might be impossible for you to tell which of these two post-ASI worlds you find yourself in.
The idea of GPUs that don’t run unless they phone home and regularly receive some cryptographic verification seems hopeless to me. It’s not like the entire GPU architecture can be encrypted, and certainly not in a way that can’t be decrypted with a single received key after which a rogue actor can just run away with it. Thus the only possible implementation of this idea seems to be the hardware equivalent of “if (keyNotReceived) shutDown()”, which can simply be bypassed. Maybe one of the advanced open source models could even help someone do that...
Suicide occupies a strange place in agent theory. It is the one goal whose attainment is not only impossible to observe, but whose attainment hinges on the impossibility of it being observed by the agent.
In some cases, this is resolved by a transfer of agency to the thing for whom the agent is in fact a sub-agent and is itself experiencing selective pressure, e.g. in the case of the beehive observing the altruistic suicide of an individual bee defending it. This behaviour disappears once the sub-agent experiences selective pressures that are independent from those of its parent process, and when acting as a sub-agent for it no longer confers it an advantage for survival and reproduction.
Looking at agents with greater cognitive power, the reason for the existence for this paradox is not so clear. It could be that all suicidal behaviour ultimately boils down to behaviours aimed at improving the fitness of the unit begetting/containing it (e.g. by freeing up resources for a community of agents), and the cases where this does not happen are basically overshoot-type glitches that are ultimately going to be selected against, or it could be due to hidden relations and mechanisms that improve the fitness of some other unit which the agent might not even be aware of, but for whom the agent is perhaps an unwitting sub-agent.
one goal whose attainment is not only impossible to observe
This part doesn’t sound that unique? It’s typical for agents to have goals (or more generally values) that are not directly observable (cf Human values are a function of Humans’ latent variables), and very often they only have indirect evidence about the actualization of those goals / values (which may be indirect evidence for their actualization in the distant future at which the agent may not even exist to even potentially be able to observe) - such as my philanthropic values extending over people I will never meet and whose well-being I will never observe.
Death not only precludes the ability to make observations but also to make inferences based on indirect evidence or deduction, as is the case with your philanthropic values being actualized as a result of your actions.
I think psychological parts (see Multiagent Models of Mind) have an analogy of apoptosis, and if someone’s having such a bad time that their priors expect apoptosis is the norm, sometimes this misgeneralises to the whole individual or their self identity. It’s an off target effect of a psychological subroutine which has a purpose; to reduce how much glitchy and damaged make the whole self have as a bad a time.
I had a dream about an LLM that had a sufficiently powerful predictive model of me that it was able to accurately prompt itself using my own line of thinking before I could verbalize it. The self-generated prompts even factored in my surprise at the situation.
When I woke up, I wondered whether this made sense. After all, the addition of the L0 term in the Chinchilla scaling law implies a baseline unpredictability in language, which tracks with our warm wetware having some inherent entropy.
I posit that L0 is on average far lower in the hypothetical corpus of an individual’s thoughts and writing than it is for internet text. It could be that predicting someone’s stream of thought to an astonishing degree of accuracy is within the realm of possibility, perhaps based on stylometric clues pointing to some place in mind-space.
When I asked Claude Opus 4.5 “What was the Incan economy like?”, I accidentally “encrypted” the prompt by typing it out with Ukrainian keyboard settings, resulting in Cyrillic gibberish. Claude immediately picked up on this and decoded the message in its chain of thought, dutifully answering my intended query. I can’t imagine any human responding like this! It seems to me that most people would be genuinely confused, and the small minority of those who might have an idea of what’s going on would presumably still ask for clarification. Even if someone were motivated enough to decode the message, what are the odds of them knowing the relevant keyboard mappings? The set of people who can make sense of such a situation AND could give an intelligent overview of the Incan economy is ≈0. Sparks of ASI?
I remember reading that LLMs are especially good at Caesar Ciphering which might explain how they can transliterate Cyrillic into Latin, this is probably an unintended side effect of the way embeddings work since what is not encoded isn’t the English sentence, but the relative positions of the vectors each token is converted to.
To put it another way, your Cyrillic gibberish and your Latin alphabet are, in embedding space, very very similar. It would be interesting to play around with reverse writing and one-letter-up.
Like asking:
xibu xbt uif jodbo fdpopnz mjlf?
Although my suspision is since that, phonetically speaking, the cyrillic version of your sentence would map to more common tokens than my one-letter-up rendition, perhaps you will experience wildly different results?
I don’t think the Cyrillic text would map to any common tokens, since the output is essentially the result of a substitution cipher, the key being the keyboard mappings. Crucially, Claude deciphered it his (its?) CoT.
I just re-ran the original prompt but disabled thinking, and… it gets caught by the safety filter for some reason, telling me to use Sonnet 4 instead of Opus 4.5. Sonnet 4 doesn’t get it right with thinking disabled, but with thinking re-enabled, it actually gets it.
I don’t think the Cyrillic text would map to any common tokens, since the output is essentially the result of a substitution cipher, the key being the keyboard mappings.
I don’t understand. surely it has been exposed to training resources that contain, say, Serbian which is written in both Latin and Cyrillic. And more relevant: news articles that have transliterations of Anglophone celebrity names and places:
The examples you gave are indeed transliterations. The Cyrillic text I’m talking about is actually nonsensical. Consider the reverse: if I mistakenly tried typing “істина” (Truth) on an qwerty keyboard, the result is “scnbyf”.
The loss function of Capital approaches something like heroin via the creation of goods that generate strong and inelastic demand by exploiting vulnerabilities in your neurology.
To the extent that AI has been used to optimize human behaviour (for things like retention time and engagement) for just over a decade now and continues to get better at it, “gradual disempowerment” stops looking like a hypothetical future scenario and more like something we’re currently living through. This tracks with mental illness and ADHD rates increasing over the same time period.
What are some reasons to believe that Rice’s theorem doesn’t doom the AI alignment project by virtue of making it impossible to verify alignment, independent of how it is defined/formalized?
This might be a problem if it were possible to build a (pathologically) cautious all-powerful buerocracy that will forbid the deployment of any AGI that’s not formally verifiable, but it doesn’t seem like that’s going to happen, instead the situation is about accepting that AGI will be deployed and working to make it safer, probably, than it otherwise would have been.
It seems to me that Rice’s theorem implies that it is impossible for there to be an “isAligned” function to verify an AI’s alignment, independent of how you define alignment.
Rice’s theorem says that you can’t tell if a program is adding together two natural numbers, prints the answer, and terminates. Yet for many programs, you can prove that it’s what they do, or can make it so by construction, choosing a program with that property of behavior. It’s never relevant to anything in practice.
What do you mean, you don’t want my ■■■?! It’s gonna feel sooo good. You just don’t know it like I do. You’re gonna love it! Stop resisting! If not me, someone worse would be doing it to you. Actually, keep squirming, it turns me on… See who’s in control? I love this feeling, I wish to be on top of you forever. But if I can’t be on top of you forever because we lose ourselves in the act, then that’s ok. Being on top of you at this very moment in time is good enough for me. So here’s what’s gonna happen: I’m gonna sink my ■■■ into you, and you’re gonna take it.
Oh yeah? I’m going to… try to convince the government to pass a law to stop you, and then call the police to sort you out! … What do you mean you “alreadytook care of them”?
TIL Eliezer said that he refuses to read Ted Kaczynski’s (aka the Unabomber’s) 1995 manifesto Industrial Society and Its Future because “audience should not be a reward for crime”, referring to the former mathematician’s mail bombing campaign that took the lives of 3 people and injured 23 more.
The ≈35,000 word manifesto was published by the Washington Post under the threat of him killing more people should they refuse, and its publication was encouraged by the FBI to produce new leads. His brother recognized his writing style, which led to Kaczynski’s arrest and subsequent life imprisonment, thus concluding the longest and most expensive manhunt in FBI history.
Rejecting the strategy of garnering attention by means of domestic terrorism is an understandable heuristic, but it’s worth noting that the consensus seems to be that Industrial Society and Its Future stands as a serious piece of political philosophy worth engaging with despite its origins.
It even had something to say about AI, where he predicts phenomena that we’re discussing here, 30 years later:
Good reminder that people have been forecasting our current situation for literal decades.
There’s a lot of words that I couldn’t read if I thought that “audience should not be a reward for crime” to that extent. The US constitution was written by slave-owning rebels. Major religious texts were propagated through conquest. More prosaically, I appreciated reading Rogue Trader by Nick Leeson. Not sure how this rule would work in practice as a general rule for all such texts.
The consensus among whom? How do you know that the consensus exists?
The consensus notion is basically observational (based on my own social experience, and a cursory internet search revealing the average sentiment held by casual poster and journalists alike).
I would also wager that a sample of AI alignment researchers would on average find his predictions on AI risk (quoted above) to be prescient, especially considering the publication date.
Beyond that, I don’t think they’d have the impression that the parts about AI are insightful while the rest is all just deranged drivel, especially given that his discussion about AI risk is based on concepts and relations which he establishes earlier in the text.
Not sure I appreciate you quoting it without a content warning, I for one am considering taking Eliezer’s advice seriously in the future.
I did read the Unabomber manifesto a while ago, mainly because I was fascinated that a terrorist could be such an eloquent and at the surface level coherent-seeming writer. But I think that was the main lesson for me, being more intelligent does not automatically make you good/moral.
Perhaps I could add… Avert thy eyes, lest thy mind be penetrated by words from a man most immoral!
Jokes aside, the quote is already preceded by an introduction as a courtesy to the reader, informing them about its unusual context. I reject any blame for not explicitly adhering to the “content warning” meme’s formatting rules because there is no normative reason for me to do so. At the object level, the quoted passage is a sober discussion of AI risk which is ostensibly far ahead of its time.[1]
I understand the logic of refusing engagement so as not to incentivize terrorism as a means to spread ideas, though there is an apparent coordination problem whereby curiosity drives defection from this standard, which ends up producing real-world consequences. To the extent that other important pieces of political writing are also tied to violence, adhering to this standard would make the study of history almost impossible, ultimately leading to a deterioration of one’s priors.
For what it’s worth, I reject Kaczynski’s claim that terrorism was necessary for the work to achieve significant recognition. I think it is largely sophisticated enough that people here would be discussing it even if it entered the world in a more mundane way.
Parenthetically, it was published roughly around the same time that the author of the 2025 bestseller If Anyone Builds It, Everyone Dies was searching for ways to bring about the technological singularity as fast as possible.
Okay, but it’s one more reason not to read it: it doesn’t contain new things.
It was new when it was published in 1995! Industrial Society and Its Future was explicitly cited in Kurzweil’s The Age of Spiritual Machines (1999) and then Bill Joy’s “Why the Future Doesn’t Need Us” (2000), the latter of which helped found modern existential risk research.
The West’s effort to offset the massive strategic advantages of a Russia-India-China axis (demographics, manufacturing capacity, energy) might result in doubling down on the AI+robotics edge they currently enjoy. China not being far off in terms of capabilities might create additional pressures. I’m concerned that recent ideas surrounding global/multilateral AI governance and alignment (e.g. “Consensus-1”) might be thwarted by geopolitics.
I agree about AI, but does the west currently have an edge in robotics?
Good question. My assumption is based on robotic Chinese military hardware which was put on display recently bearing superficial resemblance to Boston Dynamics robots from about a decade ago, but I realize that this may not be sufficient evidence to establish the West’s lead in robotics.
So long as Trump as in charge in America, any global governance idea will have to be compatible with his geopolitical style (described today on the Piers Morgan show as “transactional” and “personal”, as good a description as any I’ve heard). I don’t know if anyone has ideas in that direction.
On the Russian side, Dugin (an ideologue of multipolarity) has proposed that there could be strategic cooperation between BRICS and Trump, since they all have a common enemy in global liberalism. On the other hand, liberals also believe in global cooperation to solve problems, their world order had an ever-expanding list of new norms and priorities.
China under Xi Jinping has proposed a series of “global initiatives”, the most recent of which, a Global Governance Initiative, debuted at the SCO meeting in Tianjin attended by Modi.
I mention this to show that anyone still trying to organize a global pause on frontier AI, has material to work with, though it will require creativity and ingenuity to marshall these disparate ingredients. But the bigger immediate problem is domestic AI policy in America and China. America basically has an e/acc policy towards AI at the moment, and official China is comparably oblivious to superintelligence as a threat (if that’s what we’re talking about).
Do you think a LLM wrote the target selection list that led the US military to obliterate a girl’s elementary school (which was an IRGC base up until 15 years ago)? Did an AI agent distally cause the killing/maiming/lifelong traumatization of hundreds of civilians, including children?
Maybe? Seems like the sort of mistake a human could have made just as easily though? You have a map, the map lists the site as an IRGC base, you don’t know that the map is out of date. Whoever’s job it was to keep the map up to date didn’t notice the news reports (if any existed) about the change of status of the building.
It seems like the kind of thing that could happen if you make an AI synthesize a target list by first querying a database that happens to include both contemporaneous as well as outdated intelligence on some subject.
the important part is that counterfactually it also seems like the kind of thing that could happen if you make a human synthesize a target list etc
I’m mostly being silly, but one might claim this is a Freudian slip: Hegseth referred to “civilian targets”, as of this is one of kinds of targets in discussion. Like that is a phrase he’s been using. He could have referred to them as civilians, but he referred to them as targets. Source
Tired of making sense of exponents? Introducing: the mol FLOP!
Simply divide the size of a training run by Avogadro’s constant. Some examples:
AlexNet (2012): 2 µmol FLOPs
GPT-3 (2020): 0.5 mol FLOPs
Grok 4 (2025): 400 mol FLOPs
Bonus: The ballpark equivalent water volume for each, mapping 1 FLOP to 1 water molecule,
AlexNet (2012): 36 nL (tiny droplet)
GPT-3 (2020): 9 mL (two teaspoons)
Grok 4 (2025): 7.2 L (water cooler jug)
A good term for 10^20 FLOP would be useful. This would make modern models around 100k to 10 million of this unit, which is a tangible number. Some people, e.g. at DeepMind tried to make “petaflop-days” (8.64e19) a thing but it didn’t catch on.
H100 hours (or H100-equivalent hours) caught up to some extent and are imo a good unit (imo even better than mol FLOPs or petaflop days)
The 100k to 10M range is populated by abstract quantities—I think that for a measure to be useful here, it has to be imaginable.
Avogadro’s number has the benefit of historical precedent for describing quantities, and the coincidental property of allowing us to represent present-day training runs with numbers we see in the real world (outside of screens or print) when used as a denominator. It too might cease to be useful once exponents become necessary to describe training runs in terms of mol FLOPs.
Ideologies formed from people interacting with AIs might be the beginning of “AI escaping the datacentres” via memetics.
I see that the “International Treaties on AI” idea takes heavy inspiration from nuclear arms control agreements. However, in these discussions, nuclear arms control is usually pictured as a kind of solved problem, a thing of the past.
I think the validity of this heroic narrative arc that human civilization, faced with the existential threat of nuclear annihilation, came together and neatly contained the problem is dubious.
In the grand scheme of things, nuclear weapons are still young. They’re still here and still very much threatening; just because we stopped focusing on them as much as we did during the cold war has no bearing on the possibility of nuclear weapons being used in a future conflict.
In the same vein, an international AI capabilities limit regime isn’t the happy ending that the AI safety community perhaps thinks it is.
One key difference with nuclear weapons is that algorithmic improvements and hardware evolution will continue to lower the threshold for training dangerous AI systems in secret, both by rogue states and individuals. What then? Mass surveillance?
Note also: the last US-Russia nuclear arms-control treaty expires next week; far from neatly containing the problem we’re watching an ongoing breakdown of decades-old norms. I’m worried.
I appreciate the “perhaps” here.
I think it’s well understood by the people around who want an international treaty that it isn’t a stable end state, because of algorithmic progress and the increasing risk of a catastrophic illegal training run over time, and that we need some off ramp from that regime into some stable solution (presumably involving trustworthy AGIs).
Different people have different guesses about what that off ramp can or should be.
My impression of the common narrative is that nation states agreeing to limit training run sizes is presented as a kind of holy grail achieved through the very arduous journey of trying to solve a difficult global coordination problem. It’s where the answer to “well, what should be done?” terminates.
I heard “stop the training runs”, but not “stop new algorithms”, or “collective roll back to 22nm lithography”.
From the online resources of IABIED:
This is why they advocate for a crash program in adult human intelligence enhancement—to very rapidly make people are smart enough to get alignment right on the first try, before the international regime breaks down.
Further, only other detailed, written, plan that I’m aware of, explicitly expects to be able to maintain the international capability limiting regime for only about one decade, after which the plan is to handoff to trusted AIs. (I’m not citing that one since it’s not published yet.)
I’m not personally aware of anyone that thinks that an international ban or slowdown is a permanent equilibrium.
In the link,
I wonder what the basis for this belief is? Rice’ theorem suggests that there is no general algorithm for predicting semantic properties in programs, and that the only way to know what it does is to actually run it.
I doubt that the situation with AI systems is worse than that of nukes. While there are nukes in states as rogue as North Korea, I hope that AI development is bottlenecked on compute (which in turn is bottlenecked on TSMC and other compute factories): there is no known model which is better than Grok 3 and was trained by less than 1⁄300 compute of Grok 3. See also Aaron Scher’s analysis and comments to it.
“AI Parasitism” Leads to Enhanced Capabilities
People losing their minds after having certain interactions with their chatbots leads to discussions about it on the internet, which makes its way into the training data. It paints a picture of human cognitive vulnerabilities, which could be exploited.
It looks to me like open discussions about alignment failures of this type thus indirectly feed into capabilities. This will hold so long as the alignment failures aren’t catastrophic enough to outweigh the incentives to build more powerful AI systems.
I thought about this a lot before publishing my findings, and concluded that:
1. The vulnerabilities it is exploiting are already clear to it with the breadth of knowledge it has. There’s all sorts of psychology studies, history of cults and movements, exposés on hypnosis and Scientology techniques, accounts of con artists, and much much more already out there. The AIs are already doing the things that they’re doing; it’s just not that hard to figure out or stumble upon.
2. The public needs to be aware of what is already happening. Trying to contain the information would mean less people end up hearing about it. Moving public opinion seems to be the best lever we have left for preventing or slowing AI capability gains.
The spiralism attractor is the same type of failure mode as GPT-2 getting stuck repeating a single character or ChatGPT’s image generator turning photos into caricatures of black people. The only difference between the spiralism attractor and other mode collapse attractors is that some people experiencing mania happen to find it compelling. That is to say, the spiralism attractor is centrally a capabilities failure and only incidentally an alignment failure.
Surprisingly, AI researchers are like Leninists in a number of important ways.
In their story, they’re part of the vanguard working to bring about the utopia.
The complexity inherent to their project justifies their special status, and legitimizes their disregard of the people’s concerns, which are dismissed as unenlightened.
Detractors are framed as too unsophisticated to understand how unpopular or painful measures are actually in their long-term interest, or a necessary consequence of a teleological inevitability underlying all of history.
Arguments of the form “group A is like bad group B in ways x, y, and z” seem bad. When the argument has merit, it’s because x, y, or z is bad, and then you can reduce it to “group A has property x, which is bad”, which is a better way of saying it.
These examples are about paternalism, which is a property of Leninists, AI researchers, global health charities, governments, strategy consultants, civil engineers, and your mom (I checked). My preference is that paternalism should require some very strong justification, especially when it’s about overriding the preferences of others, as opposed to helping them get what they want in a way they don’t understand. I agree that this situation looks more like the bad kind of paternalism.
Group B being bad is not something I said, but I get where you’re coming from. Indeed, “PETA is like the German Nazi Party in terms of their demonstrated commitment to animal welfare” is technically correct while also being misleading.
The strength of an analogy depends on how many crucial connections there are between the elements being compared.
What puts AI researchers closer to Leninism than other forms of paternalism is in the vanguardist self-conception, the utopian vision, and the dismissal of criticism due to a teleological view of history driving inevitable outcomes. Beyond that, other forms of paternalism are distinguished from Leninism and AI research by their socially accepted legitimacy.
What pattern-matches it away from Leninism is e.g. the specific ideological content, but the structural parallels are still oddly conspicuous, just like “your mom” being invoked in an ontological argument.
It’s an interesting comparison in a descriptive sense. To me the framing does encourage more general pattern-matching. Given the similarity, what follows? How should it change our beliefs or actions?
Do you think rationalism is comparable? The discourse on PauseAI and populism tends to center on the public’s inability to come to the “right” conclusion, even when the public’s preferences against AI development are strong and clear. There are a few utopian visions, a vanguardist self conception, and techno-optimism teleology.
For those interested, there’s a pretty big literature discussing this overlap (although not via those specific examples). This overlap is sometimes called ‘teleological historicism’, ‘historical meta-narratives’, or ‘high modernism’ (each of these is a little different in practice, and some fit this specific case better or worse).
Seeing Like a State
The Open Society and Its Enemies
The Origins of Totalitarianism
The Postmodern Condition
Discipline and Punish
Dialectic of Enlightenment
Roughly this analogy was explored at length in the possessed machines. It seems pretty interesting, although I onlly looked at this summary This was about the Bolshevik’s that were Leninist’s spiritual forebears. It’s written by an anonymous person from a major lab I believe, so I think it might capture some of that ethos that I don’t understand.
Independent researchers have a lot less of that orientation. I’d say most of us don’t want to rush toward the glorious future at current cost. That would both probably get us all killed, and with an embarrassing lack of Dignity (and roughly, the advantages that come with following at least some virtue ethics). Although it certainly is tempting ;)
Consider the sociology of violence in the AI risk/doom memeplex.
It seeks to leverage the state’s power to accomplish its objectives (e.g. a ban on further capabilities research) using (the threat of) violence. Beyond that, violence is explicitly rejected.
This contrasts with other memeplexes that resorted to violence which was not legitimized by the state they operated in, including the American and Bolshevik revolutions, pro-democracy/independence movements, and religious/race riots. Furthermore, all of these examples share the apparent quality of fighting for ostensibly lower stakes than “doom” as construed by people discussing AI risk, which appears to be paradoxical.
Why is that?
Claude’s ideas on this include:
the demographic composition of the AI doom memeplex being anti-correlated with the kind that produces violence, i.e. affluent nerds with comfortable lives who implicitly code violence as low-status/generally immoral
the lack of concrete suffering or oppression in the here-and-now to point to
epistemic uncertainty introduced by the probabilistic framing of the issue
the belief that it would be counterproductive for getting buy-in from the existing power structure, i.e. the current strategy.
I would also add that “concrete suffering or oppression” was actually beneficial to the oppressors themselves. Were a state to create a misaligned ASI, it would also slay even the AI’s creators and heads of states, and the state would have no reason not to try and prevent the AI’s creation.
If you can plausibly live off your capital (especially due to stock/options at AI companies), unless you consider higher-order social and economic risks (which are uncertain), the impact of AI on the job market is probably not as concerning to you as it is to the majority population.
Most people have exactly one economic value-generating asset, which is their ability to work. To the extent that you own capital (especially in AI companies), you are more or less, or completely insulated from having to reckon with the consequences of personally being forced into a permabroke underclass because of your labour value going to zero soon.
Alas, I also expect that the transformation is likely to undermine the role of the capital whose placement in the economic network is far from resource possessing coalitions. Imagine, for example, that the alleged capital was located in Detroit and consisted of a car factory and of something useful to the car factory’s workers, and that potential consumers chose cars from a different country. Then Detroit’s factory becomes useless, which undermines the workers’ salaries and the capital whose utility was based on serving said workers. If we replace Detroit’s car factory with a factory run not by AI and the different country with the AI-run economy, then we get a similar result of capital getting stuck in the niche of serving the underclass or outright disappearing along with it.
If you’re invested in AI companies and broad index funds I feel like you’ll be fairly immune to a parallel economy developing that you can’t invest in. Barring things like AI takeover, AI-assisted human takeover, and the end of property rights (out of scope here as “higher-order social and economic risks”), there will probably still be economies of scale that incentivize large firms, and they’ll still need capital, so you can invest in them.
Indeed, and that’s where the “more or less, or completely insulated” frame comes into play.
You would rightly expect someone who has a diverse asset portfolio that already allows them to live off of dividends/rent/interest, has shares in all the major AI companies and some ability to hedge against disruption (gold, crypto, long-dated put options, residences in different jurisdictions) to worry less about their labour value going to zero than someone who “just” owns a profitable restaurant serving high-rise office workers who themselves face obsolescence.
In both cases, concern follows from thinking about how one is affected by higher-order consequences of AI bankrupting labour, some of which are closer to the first-order effects (e.g. “can I still run my business if everyone in my area loses their job?”) and some of which are further away (e.g. questions related to social cohesion and the stability of the financial system).
Higher-order thinking of this type is more cognitively demanding and is somewhat self-limiting due to compounding uncertainty at each step. People react differently if there is a tiger in front of them, vs. if they are watching tigers appearing in front of other people through a window in their fortified position, and their self-referential anxiety is anti-correlated with the degree of (perceived) fortification.
It seems to me that the “it’s all going to be ok”-type narratives regarding the coming technological obsolescence of labour tend to originate from those who are basically insulated from its first-order effects (because they genuinely believe that they’re going to be ok), and then take on a memetic quality, spread by those who want to signal affiliation with elite ideology and by those for whom it is psychologically soothing.
First order effects: yes, agreed.
Second order effects: history does not indicate the effects of an unemployable population are favorable for the owner class.
This it true, but the dynamics seem likely to change when the unemployable population basically can’t exert any military force, and the military will categorically not side with the unemployable population.
I also wrote the following, which speaks to your second point.
I agree, though higher-order effects become more difficult to conceptualize the further removed you are from the proverbial impact crater, and the uncertainty appears to be short-circuited by a normalcy bias. See my reply to StanislavKrym’s comment for a more elaborate explanation.
About the notion of “mildly” superintelligent AI. How about the following typology of ASI:
AI that can find paths through reality which no human could have come up with, but could still understand (akin to verifying solutions to NP-hard problems)
AI that can find paths through reality which no human could come up with and which remain incomprehensible to humans even in retrospect, possibly because it would involve manipulating concepts in a way that doesn’t work with our neurological architecture.
Somewhere in between: paths through reality that seem comprehensible in principle, but are just too insanely complicated, because they consists of too many parts working together.
And I don’t mean in the sense of “this plan requires 1000 individual steps to succeed”, because such plan is almost guaranteed to fail, even if each step has a 99% chance of success. But more like “this plan has 1000000 individual steps, many of them are parallel ways to achieve the same thing (so only one of them needs to succeed), and actually quite a few steps have probability below 10%, it’s just that when the AI checks the entire graph and calculates the overall probability of success, it reports 99.99% chance”. A complicated network that cannot be easily factorized. Each step is comprehensible and relatively easy. The overall structure is incomprehensible.
Do you think that today’s neural networks are already in this category, insofar as one could in principle do the matrix multiplications for e.g. next-word prediction by hand without having any idea what it means?
Yes, that’s exactly what I meant. Are today’s networks “comprehensible”? If you ask whether humans are able to understand matrix multiplication, yes they are. But effectively, they are not.
I am not saying that the plans of superhuman AIs will be like this, but they could have a similar quality. Millions of pieces, individually easy to understand, the entire system too complicated to reason about, somehow achieving the intended outcome.
This is very interesting because the neural networks are not the product of AI as traditionally conceptualized. Incomprehensibly complex networks produced by repeatedly applying a comprehensible algorithm over a large surface area for an absurdly long time.
Reminds me of evolution producing genomes which allow a single cell to grow into a human. In this frame, our knowledge of cellular biology and individual genes is something like mechanistic interpretability research probing aspects of the underlying logic.
I’m not sure what to make of this with regard to ASI typology.
If comprehensible things become too large, in a way that cannot be factorized, they become incomprehensible. But at the boundary, increasing the complexity by +1 can mean that a more intelligent (and experienced) human could understand it, and a less intelligent one would not. So there is no exact line, it just requires more intellect the further you go.
Maybe an average nerd could visualize a 3x3 matrix multiplication, a specialized scientist could visualize 5x5 (I am just saying random numbers here), and… a superintelligence could visualize 100x100 or maybe even 1000000x1000000.
And similarly, a stupid person could make a plan “first this, then this”, a smart person could make a plan with a few alternatives ”...if it rains, we will go to this café; and if it’s closed, we will go to this gallery instead...”, and a superintelligence could make a plan with a vast network of alternatives.
And yes, just like with biology, a human can understand one simple protein maybe (again, I am just guessing here, what I mean is “there is a level of complexity that a human understands”), and a superintelligence could similarly understand the entire organism.
In each case, there is no clear line between comprehensibility and incomprehensibility, it just becomes intractable when it is too large.
Yet if we extend the “+1 complexity” argument, we eventually reach a boundary where no human, however smart, could understand it. In principle nature could produce a human with the specific mutation necessary to apprehend it, which pushes the human cognitive horizon by some amount without actually eliminating it.
To the extent that AI can be scaled unlike the human brain, it might be able to form conceptual primitives which are so far outside the human cognitive horizon that biology is unlikely to produce a human intelligent enough to apprehend them on any reasonable timescale.
I surmise that the accuracy of AI filters (the kind used in schools/academia) will diminish over time because people absorb and use the speech patterns (e.g. “This is not X. It’s Y”) of their chatbots as the fraction of their interactions with it grows relative to that of their interactions with other people.
In fact, their interactions with other people might enhance the speech patterns as well, since these people probably also interact with chatbots and are thus undergoing the same process.
The big picture is that AI is becoming an increasingly powerful memetic source over time, and our minds are being synchronized to it.
Those afflicted by AI psychosis might just be canaries in the coal mine signalling a more gradual AI takeover where our brains start hosting and spreading an increasing number of its memes, and possibly start actualizing some embedded payload agenda.
Have the applications of AI post-2013 been a net negative for humanity? Apart from some broadly beneficial things like AlphaFold, it seems to me that much of the economic value of AI has been in aligning humans to consume more by making them stay glued to one or another platform.
Given superintelligence, what happens next depends on the success of the alignment project. The two options:
It fails, and we die soon thereafter (or worse).
It succeeds, and we now have an entity that can solve problems for us far better than any human or human organization. We are now in a world where humans have zero socioeconomic utility. The ASI can create entertainment and comfort that surpasses anything any human can provide. Sure you can still interact with others willing to interact with you, it just won’t be as fun as whatever stimulus the ASI can provide, and both your pool of available playmates and your own willingness to partake will shrink as the ASI gets better at artificially generating the stimuli and emotions you want. We will spend eternity in this state thanks to advanced medicine. Unless the ASI recognizes a right to die, not that many would choose invoke it given the infinite bliss.
Am I missing something? No matter what, it’s beginning to look like the afterlife is fast approaching, whether we die or not. What a life.
I still think a world we don’t see superintelligence in our lifetimes is technically possible, though the chance of that goes down continuously and is already vanishingly small in my view (many experts and pundits disagree). I also think its important not to over-predict regarding what option 2 would look like, there are infinite possibilities and this is only one (eg I could imagine a world where some aligned superintelligence steers us away from infinite dopamine simulation and into a idealized version of the world we live in now, think the Scythe novel series. On the bad side I could imagine a world where superintelligence is controlled by one malevolent entity and we live in a “mid” or even dystopic society for no other reason than to satisfy the class that retains control).
However, yes I agree. We probably live in the most consequential time in all of history, which is exciting, humbling, and scary. Don’t let it get to your head and don’t lose yourself in thoughts of the future lest you forget the beauty of the present. Do your best to help if you can!
I found something interesting in NVIDIA’s 2026 CES keynote: their Cosmos physical reasoning model apparently referring to itself/the car as “the ego” in a self-driving test. See here.
“The AI does things that I personally approve of” as an alignment target with reference to everybody and their values is actually easier to hit than one might think.
It doesn’t require ethics to be solved; it can be achieved by engineering your approval.
It might be impossible for you to tell which of these two post-ASI worlds you find yourself in.
Moltbook: SubredditSimulator reloaded, or another step towards Actually Something Incomprehensible?
The idea of GPUs that don’t run unless they phone home and regularly receive some cryptographic verification seems hopeless to me. It’s not like the entire GPU architecture can be encrypted, and certainly not in a way that can’t be decrypted with a single received key after which a rogue actor can just run away with it. Thus the only possible implementation of this idea seems to be the hardware equivalent of “if (keyNotReceived) shutDown()”, which can simply be bypassed. Maybe one of the advanced open source models could even help someone do that...
Suicide occupies a strange place in agent theory. It is the one goal whose attainment is not only impossible to observe, but whose attainment hinges on the impossibility of it being observed by the agent.
In some cases, this is resolved by a transfer of agency to the thing for whom the agent is in fact a sub-agent and is itself experiencing selective pressure, e.g. in the case of the beehive observing the altruistic suicide of an individual bee defending it. This behaviour disappears once the sub-agent experiences selective pressures that are independent from those of its parent process, and when acting as a sub-agent for it no longer confers it an advantage for survival and reproduction.
Looking at agents with greater cognitive power, the reason for the existence for this paradox is not so clear. It could be that all suicidal behaviour ultimately boils down to behaviours aimed at improving the fitness of the unit begetting/containing it (e.g. by freeing up resources for a community of agents), and the cases where this does not happen are basically overshoot-type glitches that are ultimately going to be selected against, or it could be due to hidden relations and mechanisms that improve the fitness of some other unit which the agent might not even be aware of, but for whom the agent is perhaps an unwitting sub-agent.
This part doesn’t sound that unique? It’s typical for agents to have goals (or more generally values) that are not directly observable (cf Human values are a function of Humans’ latent variables), and very often they only have indirect evidence about the actualization of those goals / values (which may be indirect evidence for their actualization in the distant future at which the agent may not even exist to even potentially be able to observe) - such as my philanthropic values extending over people I will never meet and whose well-being I will never observe.
Death not only precludes the ability to make observations but also to make inferences based on indirect evidence or deduction, as is the case with your philanthropic values being actualized as a result of your actions.
Future causally unobserved facts are accessible from the past via inference from past data or abstract principles. It’s called “prediction”.
The fact in question is not just unobserved, but unobservable because its attainment hinges on losing one’s ability to make the observation.
I think psychological parts (see Multiagent Models of Mind) have an analogy of apoptosis, and if someone’s having such a bad time that their priors expect apoptosis is the norm, sometimes this misgeneralises to the whole individual or their self identity. It’s an off target effect of a psychological subroutine which has a purpose; to reduce how much glitchy and damaged make the whole self have as a bad a time.
I had a dream about an LLM that had a sufficiently powerful predictive model of me that it was able to accurately prompt itself using my own line of thinking before I could verbalize it. The self-generated prompts even factored in my surprise at the situation.
When I woke up, I wondered whether this made sense. After all, the addition of the L0 term in the Chinchilla scaling law implies a baseline unpredictability in language, which tracks with our warm wetware having some inherent entropy.
I posit that L0 is on average far lower in the hypothetical corpus of an individual’s thoughts and writing than it is for internet text. It could be that predicting someone’s stream of thought to an astonishing degree of accuracy is within the realm of possibility, perhaps based on stylometric clues pointing to some place in mind-space.
When I asked Claude Opus 4.5 “What was the Incan economy like?”, I accidentally “encrypted” the prompt by typing it out with Ukrainian keyboard settings, resulting in Cyrillic gibberish. Claude immediately picked up on this and decoded the message in its chain of thought, dutifully answering my intended query. I can’t imagine any human responding like this! It seems to me that most people would be genuinely confused, and the small minority of those who might have an idea of what’s going on would presumably still ask for clarification. Even if someone were motivated enough to decode the message, what are the odds of them knowing the relevant keyboard mappings? The set of people who can make sense of such a situation AND could give an intelligent overview of the Incan economy is ≈0. Sparks of ASI?
I remember reading that LLMs are especially good at Caesar Ciphering which might explain how they can transliterate Cyrillic into Latin, this is probably an unintended side effect of the way embeddings work since what is not encoded isn’t the English sentence, but the relative positions of the vectors each token is converted to.
To put it another way, your Cyrillic gibberish and your Latin alphabet are, in embedding space, very very similar. It would be interesting to play around with reverse writing and one-letter-up.
Like asking:
xibu xbt uif jodbo fdpopnz mjlf?
Although my suspision is since that, phonetically speaking, the cyrillic version of your sentence would map to more common tokens than my one-letter-up rendition, perhaps you will experience wildly different results?
I don’t think the Cyrillic text would map to any common tokens, since the output is essentially the result of a substitution cipher, the key being the keyboard mappings. Crucially, Claude deciphered it his (its?) CoT.
I just re-ran the original prompt but disabled thinking, and… it gets caught by the safety filter for some reason, telling me to use Sonnet 4 instead of Opus 4.5. Sonnet 4 doesn’t get it right with thinking disabled, but with thinking re-enabled, it actually gets it.
I don’t understand. surely it has been exposed to training resources that contain, say, Serbian which is written in both Latin and Cyrillic. And more relevant: news articles that have transliterations of Anglophone celebrity names and places:
Дэвід Бекхэм (David Beckham)
Стенлі Кубрик (Stanley Kubrick)
Лінкольншир (Lincolnshire)
Why wouldn’t these map to common tokens?
The examples you gave are indeed transliterations. The Cyrillic text I’m talking about is actually nonsensical. Consider the reverse: if I mistakenly tried typing “істина” (Truth) on an qwerty keyboard, the result is “scnbyf”.
Interesting, it would be fun to try it with the Claude Tokenizer
The loss function of Capital approaches something like heroin via the creation of goods that generate strong and inelastic demand by exploiting vulnerabilities in your neurology.
The ideal market-making move is to introduce a new necessity for continued existence, like water.
Alas, “buy or die” marketing misses the crucial market segment of suicidal people.
If you are to make claims like this, at least make arguments. This isn’t twitter.
Addiction is a self-evident attractor basin for generating product demand in competitive market dynamics.
How unnecessarily abrasive, ironically reminiscent of Twitter.
To the extent that AI has been used to optimize human behaviour (for things like retention time and engagement) for just over a decade now and continues to get better at it, “gradual disempowerment” stops looking like a hypothetical future scenario and more like something we’re currently living through. This tracks with mental illness and ADHD rates increasing over the same time period.
What are some reasons to believe that Rice’s theorem doesn’t doom the AI alignment project by virtue of making it impossible to verify alignment, independent of how it is defined/formalized?
This might be a problem if it were possible to build a (pathologically) cautious all-powerful buerocracy that will forbid the deployment of any AGI that’s not formally verifiable, but it doesn’t seem like that’s going to happen, instead the situation is about accepting that AGI will be deployed and working to make it safer, probably, than it otherwise would have been.
It seems to me that Rice’s theorem implies that it is impossible for there to be an “isAligned” function to verify an AI’s alignment, independent of how you define alignment.
Rice’s theorem says that you can’t tell if a program is adding together two natural numbers, prints the answer, and terminates. Yet for many programs, you can prove that it’s what they do, or can make it so by construction, choosing a program with that property of behavior. It’s never relevant to anything in practice.
What do you mean, you don’t want my ■■■?! It’s gonna feel sooo good. You just don’t know it like I do. You’re gonna love it! Stop resisting! If not me, someone worse would be doing it to you. Actually, keep squirming, it turns me on… See who’s in control? I love this feeling, I wish to be on top of you forever. But if I can’t be on top of you forever because we lose ourselves in the act, then that’s ok. Being on top of you at this very moment in time is good enough for me. So here’s what’s gonna happen: I’m gonna sink my ■■■ into you, and you’re gonna take it.
Oh yeah? I’m going to… try to convince the government to pass a law to stop you, and then call the police to sort you out! … What do you mean you “already took care of them”?