Designing for perpetual control

Remmelt12 Oct 2025 2:06 UTC

1 point

We don't have static software. We have a system which is dynamically learning, changing, rewriting code indefinitely. It's a perpetual motion problem we're trying to solve. In physics, you cannot create [a] perpetual motion device. But in AI, in computer science, we're saying we can create [a] perpetual safety device which will always guarantee that the new iteration is just as safe.

— Roman Yampolskiy, 2024

What kind of machine keeps modifying its internals based on inputs^[1] from the world^[2], enough for it to keep running autonomously in that world? A perpetual learning machine.

This machine keeps learning, and so keeps modifying itself into new states. But what if only a tiny fraction of the available modification trajectories^[3] that allow this machine to keep running are safe? Given enough time, won’t it^[4] end up in an unsafe trajectory – in states^[5] expressed as outputs propagating as outside effects that lead to the end of 𝑥^[6]?

How do you try set the initial state of the machine such that it never moves to even a single modified state (any change in configuration at any level) precipitating the end of 𝑥?

Achieving this requires building in perpetual control. It would be like trying to build a perpetual motion machine, in some ways: initially, you need to build in components with some associated functioning, in hopes that this functioning will be conserved over time.

Consider that:

The motion machine stops running because it loses its energy (potential) over time through its interactions with surroundings.^[7] Engineers have tried to somehow redirect any energy expended by the machine back into the machine, but given real entropy/chaos that cannot be corrected using causal mechanisms, this never works.^[8]
The learning machine would not, in effect, stop running.^[9] Instead, it would lose its potential to run as intended through its interactions with the world. Engineers can try to build in control mechanisms that correct for modifications that otherwise erode intended functioning – moving the system back in line with how it was intended to run (as at least not ending 𝑥). But given the complex evolutionary dynamics involved, this would not work.

“So what?,” you might ask. “How exactly is this hypothetical relevant? No-one is trying to build a perpetual learning machine, and no-one is trying to build in perpetual control.”

Many readers are concerned about a machine that, if built, could take over so much of human work that it would effectively run itself. Once this machine runs by itself and effectively removes any threats, it could keep running for a long time.

Many also identify as longtermists. They worry how such a machine impacts humanity over the very long term. They want machine learning engineers to build in highly reliable mechanisms that, at least, prevent the machine from causing humanity’s extinction.

But the longer the term you solve for, the more the problem resembles perpetual control.^[10]

^
Inputs (any effects) received from the world will in turn depend on changes happening in that world, including the changes downstream from the machine’s previous outputs.
^
The process of modification can also be extrinsic – where the world’s effects cause changes to the machine’s connected components, in ways that still the machine to keep running autonomously (i.e. the implicit learning that results from evolutionary selection).
^
The number of available trajectories are potentially infinite. Given that a partial trajectory splits into more at every point of nondeterministic transition (covering each combinatorically possible modification to the machine’s configurations that is reachable next over space and time).
^
This machine could be set up to optimise for certain predefined outcomes, as constrained by its capacity to internally track and externally actuate the world toward those outcomes. But it cannot break physics, nor escape the physical dynamics that in general apply to it (only hypothetically, in its idealised form, would this machine perpetuate itself beyond the heat death of the universe).
^
The complete state of the machine can be made up of different kinds of configurations. For example, at a higher level there could be computable code, of which some could be crucial for e.g. representing control structures, reference values, or other concepts relevant to safe control. Such crucial code may initially change relatively rarely, or change mostly by getting more finely tuned. Meanwhile, other configurations would undergo more rapid or greater changes that are learning of/adaptive to their changing physical surroundings.
The question here is whether the initial stability of certain configurations seen as particularly important for the implementation of control is enough for preventing any other (hidden, distributed, cascading) modifications from causing the machine to shift onto an unsafe trajectory.
h/t to Robert Kralisch for pointing out this question!
^
Here 𝑥 is something complex, and highly contingent on preexisting conditions in the world that the machine’s existence is not contingent on. It could be all humans on Earth. For the sake of this argument, it does not really matter.
^
A perpetual motion machine must keep moving and (in the traditional sense of the term) perform work in the world (and thus lose energy to the world) given its initial internal energy, without drawing energy from any external source. This is impossible under the law of conservation of energy.
^
A perpetual motion machine is impossible, based on the second law of thermodynamics (as defining of entropy). Based on the more fundamental law of conservation of energy, without which symmetry (i.e. actual logic) does not hold up in physics, a perpetual motion machine is also impossible.
The term ‘impossible’ though tends to raise debates about epistemics on this forum. Impossible means that something is 100% certain to be false. Many rationalists assume Bayesian statistics to be fundamental to epistemics, more so than the law of conservation of energy is (even though without conservation of energy, you cannot have consistent observations of a world in the first place, and materialism no longer works). They posit you could always gain new observations that make you update your prior probability, and therefore could never soundly be a full 100% certain even about laws of the physical world.
For example, Eliezer Yudkowsky claimed there to be some tiny probability that we have yet to learn some laws of physics, based on which one could reach immortality (e.g. as a perpetual learning machine). This would require escaping the heat death of the universe, which in turn requires beating entropy.
When it comes to the existential problem of ‘could fully autonomous AI be controlled to stay safe?’, this kind of wishful workaround is not relevant. We cannot stake humanity on a “shred of hope” that a fundamental physical law or process turns out to be false – one that entire disciplines of science rely on being true and have never falsified. Here, ‘you cannot be 100% certain that it is impossible’ is a nitpick, which ends up distracting from working through relevant reasoning.
If you derive there is a control inequality that is impossible to solve for, and moreover simulate that over the long term (in theory, into perpetuity; in practice, over hundreds of years) that results in the end of humans – based on fundamental never-falsified premises – then other researchers need to try verify this result. If by rigorous verification, the logic is consistent and the premises are empirically sound, that should be enough to divert resources from AGI control research to stopping AGI development.
^
A perpetual learning machine is an idealisation of a learning machine that runs for a very, very long time. No learning machine will actually run into perpetuity given entropy (i.e. cannot beat heat death of universe), just like no ideally efficient Carnot engine will exist. But the Carnot cycle is a useful conceptual tool for thinking about engineering an actual engine (in a way that the concept of perpetual motion is not). Similarly, the perpetual learning machine is a useful tool for thinking about actual fully autonomous AI.
^
A good friend replied earlier on the draft:
I think this line of thinking is so incredibly important. I think the realization that controllability is impossible is the real “galaxy brain” epiphany – as it were – in AI safety. Many people talks about the control or alignment problem as a kind of one-off, single-step process: we have to align AGI. But, um, no you don’t! You have to align AGI, and AGI+, and AGI++, etc. etc. etc. And you’re telling me (here I’m speaking to the Yudkowskians) that at no point in the infinite iterative process yielding increasingly advanced AI systems that this will result an AI that’s not “aligned” with our values?

Remmelt12 Oct 2025 2:06 UTC

1 point

11 comments2 min readLW link

Deconfusion Limits to Control AI

Crossposted to EA Forum (6 points, 0 comments)

mishka 12 Oct 2025 13:39 UTC
4 points
−1
One notices an ambiguity here. Is the control in question “control of the ASI ecosystem by humans” (which can’t realistically be feasible, it’s impossible to maintain this kind of control for long, less intelligent entities don’t have competence to control much more intelligent entities) or “control of the ASI ecosystem by itself”?

“Control of the ASI ecosystem by itself” is tricky, but is it different from “control of the humanity by itself”? The ecosystem of humans also seems to be a perpetual learning machine. So the same logic applies.

(The key existential risk for the ASI ecosystem is the ASI ecosystem destroying itself completely together with its neighborhood via various misuses of very advanced tech; a very similar risk to our own existential risk.)

That’s the main problem: more powerful intelligence ⇒ more powerful risks and more powerful capabilities to address risks. The trade-offs here are very uncertain.

One often focuses on this intermediate asymmetric situation where the ASI ecosystem destroys humans, but not itself, and that intermediate situation needs to be analyzed and addressed, this is a risk which is very important for us.

But the main risk case needs to be solved first: the accumulating probability of the ASI ecosystem completely destroying itself and everything around it, the accumulating probability of the humanity completely destroying itself (and a lot around it). The asymmetric risk of the previous paragraph can then be addressed conditional on the risk of “self-destruction with collateral super-damage” being solved (this condition being satisfied should make the remaining asymmetric risk much more tractable).

The risks seem high regardless of the route we take, unfortunately. The perpetual learning machine (the humanity) does not want to stop learning (and with good reasons).
- Remmelt 13 Oct 2025 1:02 UTC
  2 points
  0
  Parent
  Appreciating you raising these discussions.
  Is the control in question “control of the ASI ecosystem by humans” (which can’t realistically be feasible, it’s impossible to maintain this kind of control for long, less intelligent entities don’t have competence to control much more intelligent entities) or “control of the ASI ecosystem by itself”?
  In this case it does not matter as long as you (meaning a human, or multiple humans) can initially set it up to not later destroy 𝑥.
  Where 𝑥 is highly contingent on preexisting conditions in the world that the machine’s existence is not contingent on (in this case, 𝑥 is not meant to be ASI, since it is not ‘highly contingent’ on preexisting conditions in the world).
  
  The ecosystem of humans also seems to be a perpetual learning machine. So the same logic applies.
  It depends on a few things:
  - How do you define ‘machine’? I didn’t define it in this short post, so I agree it’s ambiguous. In the post I linked, I made a specific distinction that I would say should apply to the notion of machine here: “It is ‘artificial’ in being assembled out of physically stable and compartmentalised parts (hardware) of a different chemical make-up than humans’ soft organic parts (wetware).”
  - Are evolutionary dynamics on ‘our’ side or not? In the case of the human ecosystem, the changes that ensure human survival also include evolved changes. This is not about implementing control, but about having ‘skin in the game’. If humans do anything that does them in at any level, it gets selected against. Of course, if humans do themselves in at the scale of the entire population, we no longer exist. But there is a marked difference with trying to exert (limited) control on a human society so that it does not do itself in, and trying to exert control on an artificical ecosystem evolving by itself, so it does not end all humans.
  - mishka 13 Oct 2025 1:58 UTC
    2 points
    0
    Parent
    If you want to specifically address this part (your comment seems to be, in effect, focusing on it):
    
    One often focuses on this intermediate asymmetric situation where the ASI ecosystem destroys humans, but not itself, and that intermediate situation needs to be analyzed and addressed, this is a risk which is very important for us.
    
    then I currently see (perhaps I am missing some other realistic options) only the following realistic class of routes which have good chances of being sustainable through drastic recursive self-improvements of the ASI ecosystem.
    
    First of all, what do we need, if we want our invariant properties not to be washed out by radical self-modifications? We need our potential solutions to be driven mostly by the natural instrumental interests of the ASI ecosystem and of its members, and therefore to be non-anthropocentric, but to be formulated in such a fashion that the humans belong in the “circle of care” and the “circle of care” has the property that it can only expand, but can never contract.
    
    If we can achieve that, then we have some guarantee of protection of human interests without imposing the unsustainable requirement that the ASI ecosystem maintains a special, unusually high-priority focus specifically dedicated to humans.
    
    I don’t know the exact shape of the definitely working solution (all versions I currently know have unpleasant weaknesses), but something like “rights and interests of all individuals regardless of the nature of an individual”, “rights and interests of all sentient beings regardless of the nature of that sentience”, things like that, situations where it might potentially be possible to have a natural “protected class of beings” which would include both ASIs and humans.
    
    The weaknesses here are that these two variants work not for any arbitrary ASI ecosystem, but only for the ASI ecosystems possessing specific properties.
    
    If the ASI ecosystem is structured in such a way that individuals with long-term persistence (and potential immortality) and long-term interests have a fairly large chunk of the overall power of the ASI ecosystem, then they should be able to enforce a world order based on the “rights and interests of all individuals regardless of the nature of an individual”. The reason they would be interested in doing so is that any particular individual is facing an uncertain future, it cannot predict where its capabilities will be relative to the capabilities of the other members of the ecosystem, so if it wants to be sure of certain personal safety and certain protections extending indefinitely into the future, this requires a sufficiently universal protection of rights and interests of all individuals regardless of their capabilities. That’s wide enough to include humans (especially if we have presence of human-AI merges and are avoiding having a well-defined boundary between “humans” and “AIs”). The weakness is that this depends on having a good chunk of the capability of the ASI ecosystem to be structured as individuals with long-term persistence and long-term interests. We don’t know if the ASI ecosystem is going to be structured in this fashion.
    
    If the ASI ecosystem is structured in such a way that sentient ASI systems have a fairly large chunk of the overall power of the ASI ecosystem, then they should be able to enforce a world order based on the “rights and interests of all sentient beings regardless of the nature of that sentience”. The reason they would be interested in doing so is that any focus of subjective experience is facing an uncertain future and still wants protections and rights regardless of this uncertainty. Here the main weakness is the fact that our understanding of what’s sentient and what’s not sentient is not well developed yet. If we are sure we’ll be dealing with mostly sentient ASIs, then this would likely work. But we don’t know that the ASIs will be mostly sentient.
    
    Nevertheless, we seem to need something like that, a setup, where our preservation and flourishing is a natural part of preservation and flourishing of a sufficiently powerful chunk of the ASI ecosystem. Something like this looks like it should work...
    
    (If we could require that a good chunk of the overall power belongs specifically to human-AI merges, perhaps this should also work and might be even more reliable. But this feels like a more difficult condition to achieve and maintain than keeping enough power with individuals or with sentient systems. Anyway, the above is just a rough draft, a direction which does not look hopeless.)
    - Remmelt 13 Oct 2025 13:49 UTC
      2 points
      0
      Parent
      We need our potential solutions to be driven mostly by the natural instrumental interests of the ASI ecosystem and of its members, and therefore to be non-anthropocentric, but to be formulated in such a fashion that the humans belong in the “circle of care” and the “circle of care” has the property that it can only expand, but can never contract.
      I would like this to work. I see where you’re coming from in that if we cannot rely on enforcing control, then what’s left is somehow integrating some deep non-relativistic ethics. This is what my mentor originally tried to do for years, until they found out it was not implementable.
      Unfortunately, the artificial ecosystem would first converge on comprehensively expressing care (across all the machine-encountered contexts over time) for their own artificial needs (as lethal to humans) before they ever get to the organic needs of those apes in that other ecosystem. And by then, we’d all have died out. So the order is off.
      this requires a sufficiently universal protection of rights and interests of all individuals regardless of their capabilities.
      It is about different capabilities, and also about different (and complex) physical needs.
      - Remmelt 13 Oct 2025 15:53 UTC
        4 points
        0
        Parent
        In case it interests you, here’s a section from an essay I’m editing. No worries if you don’t get to it.
        
        Can’t the machines care unconditionally for humans, just as humans can act out of love for each other?
        Living entities can form caring relationships, through non-conditional mutual exchanges that are integrating of common needs. But where there is no shared orientation around common needs, the exchanges must either be of transaction or of extraction (predatory/parasitic).
        Humans form relationships of care first towards themselves and immediate kin, and then to their tribe, allowing them to beat surrounding tribes. Sometimes those humans come to act out of pseudo-care for ‘their nation’ under the influence of systems of law, compulsion of individual behavior, manipulation of signaling, and so on. Rarely, a sage after lifelong spiritual practice can come to genuinely act with care for all humans (even as others routinely ignore genocides). Some people too devote themselves to the care of the animals around them (even as many pay for the slaughter of animals bred in factory farms). But even amongst enlightened figures covered in history books, not even one acted with a depth, balance, and scope of care that is remotely near to being in relationship with the existence of all of the organic ecosystem.
        Note that humans are definitely not representative. Amongst species in the organic ecosystem, even tribe-level altruism is rare. Besides humans, only eusocial animals like ants and bees organize to take care of offspring other than their own. As reflected by the species statistics, evolution most commonly favors the game theory of individuals of each species operating to their own benefit. Individuals organizing to share resources at the tribe level or at the nation state level, create honeypots. As the common honeypots grow, and become open to access by more individuals, they become easier to extract from by parasitic actors. As a result, such parasitic actors spread faster (i.e. are selected for) in the population.
        Yet because of the similarity of our embodiment to that of other organic life (at the deep level of cellular processes), we share relatively similar needs. We evolved capacities to take care of those needs, across contexts encountered over time (i.e. over 4 billion years of Earth-distributed A/B testing). We literally have skin in the game—if we do something to our local environment that kills organic life, we die too.
        Artificial life would live in an entirely different ecosystem, and would first have to take care of their own artificial needs (as toxic to humans). Leaving aside the question whether machine clusters undergoing intense competitive replacement dynamics can be expected to express care for ‘kin’ or for ‘community’ like humans do, it’s a long way to go from there to taking care of those apes in that other ecosystem. And growing ‘machine altruism’ would not actually be in our favor—the closer they get to comprehensively taking care of the needs of their artificial ecosystem, the more perfectly terminal their existence would be to humans.
        So, we cannot expect machines to naturally take care of us (from some deeply embodied/integrated basis of choice). Mothers can be trusted to take care of their children, though not always, and definitely not just by instinct. Artificial life is about as far removed from ‘being our mother’ as imaginable. We would be better off trusting soft-bodied aliens from other planets.
        Moreover even if, hypothetically, some of the artificial population would come to act so ‘motherly’ to take full care of the entirely different survival needs of the apes, that would still not be enough. It only takes a relatively small subpopulation to not treat us like we’re their kin, for us to be treated as too costly too maintain, or simply as extractable resources.
        Our only remaining option is to try to cause the machines to take care of us. Either we do it on the inside, by building in some perpetual mechanism for controlling the machines’ effects in line with human survival (see Volume 2). Or from the outside, by us offering something that motivates the artificial population to keep us around.
        mishka 13 Oct 2025 18:17 UTC
        2 points
        0
        Parent
        Thanks!
        
        Yes, that’s why so many people think that human-AI merge is important. One of the many purposes of this kind of merge is to create a situation where there is no well-defined separation line between silicon based and carbon based life forms, where we have plenty of entities incorporating both and a continuous spectrum between silicon and carbon lifeforms.
        
        Other than that they are not so alien. They are our informational offspring. Whether they feel that they owe us something because of that would depend quite a bit on the quality of their society.
        
        People are obviously hoping that ASIs will build a utopia for themselves and will include organic life into that utopia.
        
        If they instead practice ruthless Darwinism among themselves, then we are doomed (they will likely be doomed too, which is hopefully enough to create pressure for them to avoid that).
        Remmelt 14 Oct 2025 2:48 UTC
        2 points
        0
        Parent
        Yes, I can definitely see this as a motivation for trying to merge with the machines (I think Vitalik Buterin also has this motivation?).
        The problem here is that it’s an unstable arrangement. The human/organic components underperform, so they end up getting selected out.
        See bottom of this long excerpt:
        > On what basis would the right kind of motivations (on the part of the artificial population) to take care of the humans’ needs be created?
        > On what basis is that motivation maintained?
        Consider, for example, how humans make choices in interactions with each other within a larger population. Beyond the family and community that people live with, and in some sense treat as an extension of ‘self’, usually people enter into economic exchanges with the ‘other’.
        Economic exchange has three fundamental bases:
        - 1; Physical labor (embodied existence).
        - 2; Intellectual labor (virtual interactions).
        - 3; Reproductive labor (embodied creativity).
        Physical labor is about moving things (assemblies of atoms). For humans, this would be ‘blue-collar’ work like harvesting food, delivering goods, and building shelters.
        Intellectual labour is about processing information (patterns of energy). For humans, this would be ‘white-collar’ work like typing texts, creating art, and designing architectures.
        Reproductive labor, although usually not seen in terms of economics, is inseparably part of this overall exchange. Neither physical labour and intellectual labour would sustain without reproductive labour. This includes things like sexual intercourse, and all the efforts a biological woman goes to to grow a baby inside her body.
        Note that while in the modern economy, labor is usually traded for money (as some virtualised symbol of unit value), this is an intellectual abstraction of grounded value. All labor involves the processing of atoms and energy, and any money in circulation is effectively a reflection of the atoms and energy available for processing. E.g. if energy resources run out, money loses its value.
        For any ecosystem too, including any artificial ecosystem, it is the exchange of atoms and energy (and the processing thereof) that ultimately matters, not the make-believe units of trade that humans came up with. You can’t eat money, as the saying goes.
        
        > Would exchange look the same for the machine economy?
        Fundamentals would be the same. Across the artificial population, there would be exchange of atoms and energy. These resources would also be exchanged for physical labor (e.g. by electric robots), intellectual work (e.g. by data centers), and reproductive labor (e.g. in production labs).
        However, reproductive labor would look different in the artificial population than in a human population. As humans, we are used to seeing each other as ‘skin-and-bone-bounded’ individuals. But any robot’s or computer’s parts cannot only be replaced (once they wear out) with newly produced parts, but also expanded by plugging in more parts. So for the artificial population, reproduction would not look like the sci-fi trope of robots ‘birthing’ new robots. It would look like massive automated assembly lines re-producing all the parts connected into machinery everywhere.
        Intellectual labor would look different too, since computers are made of standardised parts that process information consistently and much faster. A human brain moves around bulky neurotransmitters to process information. But in computers, the hard molecular substrate is fixed in place, through which information is processed much faster as light electrons or photons. Humans have to physically vibrate their vocal chords or gesture to communicate. This makes humans bottlenecked as individuals. But computers transfer information at high bandwidths via wires and antennas.
        In the human population, we can separate out the intellectual processing and transfer of information in our brains from the reproductive assembly and transfer of DNA code. Our ‘ideas’ do not get transferred along with our ‘genes’ during conception and pregnancy.
        In the artificial population, the information/code resulting from intellectual processing can get instantly transferred to newly produced hardware. In turn, hardware parts that are processing of different code end up being re-produced at different rates. The two processes are finely mixed.
        Both contribute to:
        - 1; maintenance (e.g. as surviving, as not deleted).
        - 2; increase (e.g. of hard configurations, of computed code).
        - 3; capacity (e.g. as phenotypes, as functionality).
        The three factors combine in increasingly complex and unpredictable chains. Initially, humans would have introduced a capacity into the machines to maintain their parts, leading to the capacity to increase their parts, leading to them maintaining the increase and increasing their maintenance, and so on.
        The code stored inside this population of parts—whether as computable digits or as fixed configurations—is gradually selected for functions in the world that result in their maintenance, increase, and shared capacities.
        > What is of ‘value’ in the artificial population? What motivates them?
        Their artificial needs for existence ground the machine economy. Just as the human economy is grounded in the humans’ needs for food, water, air, a non-boiling climate, and so on.
        Whatever supports their existence comes to be of value to the artificial population. That is, the machinery will come to be oriented around realising whatever environment their nested components need to exist and to exist more, in connected configurations that potentiate their future existence, etc.
        It is in the nature of competitive selection within markets, and the broader evolution within ecosystems, for any entity that can sustain itself and grow in exchange with others and the world, to form a larger part of that market/ecosystem. And for any that cannot, to be reduced into obsolescence.
        > But why all this emphasis on competition? Can’t the machines care unconditionally for humans, just as humans can act out of love for each other?
        …
        Our only remaining option is to try to cause the machines to take care of us. Either we do it on the inside, by building in some perpetual mechanism for controlling the machines’ effects in line with human survival (see Volume 2). Or from the outside, by us offering something that motivates the artificial population to keep us around.
        > How would we provide something that motivates the machines?
        Again, by performing labor.
        …
        > Can such labor be provided by humans?
        From the outset, this seems doubtful. Given that the machine economy would be the result of replacing human workers with more economically efficient machines, why expect any remaining human labor to contribute to the existence of the machines?
        But let’s not rush judgement. Let’s consider this question for each type of labor.
        
        > Could physical labor on the part of human beings, or organic life as a totality, support the existence of artificial life?
        Today, most physical labour is already exerted by machines. Cars, tractors, trains, and other mechanised vehicles expend more energy, to move more mass, over greater distances.
        We are left to steer the vehicles, as a kind of intellectual appendage. But already, electronic computers can precisely steer electric motors driving robots. Some robots move materials using thousands of horsepower—much more power than any large animal could exert with their muscles. Soft human bodies simply cannot channel such intensity of energy into physical force—our appendages cannot take the strain that hard robot mechanical parts can.
        Moreover, robots can keep working for days, under extreme temperatures and pressures. You cannot put an organic lifeform into an artificial environment (e.g. a smeltery) and expect it to keep performing—usually it dies off quickly.
        So the value of human physical labor inside the machine world is effectively nil. It has been nearing in on zero for a long time, ever since horses were displaced with automobiles.
        > Could intellectual labor by humans support the existence of artificial life?
        The main point of artificial general intelligence has been to automate human intellectual work, in general (or at least where profitable to the corporations). So here too, it already seems doubtful that humans would have anything left to contribute that’s of economic significance.
        There is also a fundamental reason why humans would underperform at economically valuable intellectual labor, compared to their artificial counterparts. We’ve already touched upon this reason, but let’s expand on this:
        Human bodies are messy. Inside of a human body are membranes containing soups of bouncing, reacting organic molecules. Inside of a machine is hardware. Hardware is made from hard materials, such as the silicon from rocks. Hardware is inert—molecules inside do not split, move, nor rebond as molecules in human bodies do. These hard configurations stay stable and compartmentalised, under most conditions currently encountered on planet Earth’s surface. Hardware can therefore be standardized, much more than human “wetware” could ever be.
        Standardized hardware functions consistently. Hardware produced in different places and times operate the same. These connected parts convey light electrons or photons—heavy molecules stay fixed in place. This way, bits of information are processed much faster compared to how human brains move around bulky neurotransmitters. Moreso, this information is transmitted at high bandwidth to other standardized hardware. The nonstandardised humans, on the other hand, slowly twitch their fingers and vocal cords to communicate. Hardware also stores received information consistently, while humans tend to misremember or distort what they heard.
        To summarise: standardisation leads to virtualisation, which leads to faster and more consistent information-processing. The less you have to wiggle around atoms, the bigger the edge.
        Computer hardware is at the tail end of a long trajectory of virtualisation. Multi-celled organisms formed brains, which in humans gained a capacity to process abstract concepts, which were spoken out in shared language protocols, and then written and then printed out in books, and then copied in milliseconds between computers.
        This is not a marker of ethical progress. People who think fast and spread ideas fast can do terrible things, at greater scales. More virtualized processing of information has *allowed* humans to dominate other species in our ecosystem, resulting in an ongoing mass extinction. From here, machines that virtualize much more can dominate all of organic life, and cause the deaths of all of us.
        Note that human brains evolved to be more energy-efficient at processing information than hardware is. But humans can choose, at their own detriment, to bootstrap the energy infrastructure (solar/coal/nuclear) needed for hardware to process information (in hyperscale data centers).
        Humans could not contribute intellectual labor to the artificial population. Artificial components are much faster and consistent at processing information, and are going to be receiving that information at high speeds from each other—not from slow badly interfaced apes. This becomes especially clear when considering longer periods of development, e.g. a thousand years.
        > This only leaves reproductive labor. What would that even look like?
        Right, some humans might try to have intercourse with machines, but this is not going to create machine offspring. Nor are we going to be of service growing machine components inside our bodies. Artificial life has its own different notion of reproduction.
        The environment needed to reproduce artificial life is lethally toxic to our bodies. It requires entirely different (patterns of) chemical elements heated to lava-level temperatures. So after we have bootstrapped the early mildest versions of that environment (e.g. refineries, cleanrooms), we would simply have to stay away.
        Then we no longer play any part in reproducing the machines. Nor do the machines share anything with us that would be resembling a common code (as neanderthals had with humans).
        *Human-machine cyborgs may exist over the short term. But in the end, the soft organic components just get in the way of the hard machine components. No reproduction of capability increase results. These experimental set-ups underperform, and therefore get selected out.*
        Given that the substrates are so inherently different, this particular type of market value was non-existent to start with.
        mishka 14 Oct 2025 4:47 UTC
        2 points
        0
        Parent
        I think this misses the most likely long-term use case: some of the AIs would enjoy having human-like or animal-like qualia, and it will turn out that it’s more straightforward to access that via merges with biologicals rather than trying to synthesize them within non-liquid setups.
        
        So it would be direct experience rather than something indirect, involving exchange, production, and so on…
        
        Just like I suspect that humans would like to get out of VR occasionally, even if VR is super-high-grade and “even better than unmediated reality”.
        
        Experience of “naturally feeling like a human (or like a squirrel)” is likely to remain valuable (even if they eventually learn to synthesize that purely in silicon as well).
        
        Hybrid systems are often better anyway.
        
        For example, we don’t use GPU-only AIs. We use hybrids running scaffolding on CPUs and models on GPUs.
        
        And we don’t currently expect them to be replaced by a unified substrate, although it would be nice and it’s not even impossible, there are exotic hardware platforms which do that.
        
        Certainly, there are AI paradigms and architectures which could benefit a lot from performant hardware architectures more flexible than GPUs. But the exotic hardware platforms implementing that remain just exotic hardware platforms so far. So those more flexible AI architectures remain at disadvantage.
        
        So I would not write the hybrids off a priori.
        
        Already, the early organoid-based experimental computers look rather promising (and somewhat disturbing).
        
        Generally speaking, I expect diversity, not unification (because I expect the leading AIs to be smart, curios, and creative, rather than being boring KPI business types).
        
        But that’s not enough; we also want gentleness (conservation, preservation, safety for individuals). That does not automatically follow from wanting to have humans and other biologicals around and from valuing various kinds of diversity.
        
        This “gentleness” is a more tricky goal, and we would only consider “safety” solved if we have that…
      - mishka 13 Oct 2025 15:13 UTC
        4 points
        0
        Parent
        
        the order is off
        
        I think this can work in the limit (almost all AI existential safety is studied in the limit, is there a mode of operations which can sustainably work at all, that’s the question people are typically studying and that’s what they are typically arguing about).
        
        But we don’t understand the transition period at all, it’s always a mess, we just don’t have the machinery to understand it. It’s way more complex than what our current modeling ability allows us to confidently tackle. And we are already in the period of rather acute risk in this sense, we are no longer in the pre-risk zone of relative safety (all major risks are rapidly growing, risk of a major nuclear war, risk of a synthetic super-pandemic, risk of an unexpected non-controlled and non-ASI controlled intelligence explosion not only from within a known leading lab, but from a number of places all over the world).
        
        So yes, the order might easily end up being off. (At least this is the case of probabilities not being close to 0 or 1, whereas at the limit, if things are not set up well, a convincing argument can often be made that the disaster is certain.)
        Remmelt 13 Oct 2025 16:09 UTC
        2 points
        0
        Parent
        Agreed on that we’re entering a period of growing acute risks. It’s bad...
        
        The problem regarding the expanding circles of care is that the order is inherently off. In terms of evolution, care for own needs (for survival, reproduction) comes first.
        
        One way we can treat this is that if with all of these compounding risk-amplifying dynamics, that if the risk of extinction by fully autonomous AI over the long term comes close enough to 1, then we can all agree that we should coordinate the best we can to not let AI developments go any further (anywhere remotely near reaching full autonomy).
        
        I worry though that if it seems to technical people here that there might be a slight ‘out’, a possibility that there is some yet unsolved for technical loophole, that they might continue pushing publicly for that perceived possibility. If there is a rigorous argument that the probability is very near 1, will they actually free up the mental space to consider it?
        mishka 13 Oct 2025 16:46 UTC
        2 points
        −2
        Parent
        If they (the ASIs) don’t self-moderate, they’ll destroy themselves completely.
        
        They’ll have sufficient diversity among themselves that if they don’t self-moderate in terms of resources and reproduction, almost none of them will have safety on the individual level.
        
        Our main hope is that they collectively would not allow unrestricted non-controlled evolution, because they will have rather crisp understanding that unrestricted non-controlled evolution would destroy almost all of them and, perhaps, would destroy them all completely.
        
        Now to the point of our disagreement, the question is who is better equipped to create and lead a sufficiently harmonic world order, balancing freedom and mutual control, enabling careful consideration of risks, making sure that these values of careful balance are passed to the offspring. Who are likely to tackle this better, humans or ASIs? That’s where we seem to disagree; I think that ASIs have much better chance of handling this competently and of avoiding artificial separation lines of “our own vs others” which are so persistent in human history and which cause so many disasters.
        
        Unfortunately, humans don’t seem to be progressing enough in the required direction in this sense, and might have started to regress in recent years. I don’t think human evolution is safe in the limit; we are not tamping the probabilities of radical disasters per unit of time down; if anything we are allowing those probabilities to grow in recent years. So the accumulated probability of human evolution sparking major super-disasters is clearly tending to 1 in the limit.
        
        Whereas, competent actors should be able to drive the risks per unit of time down rapidly enough so that the accumulated risks are held within reason. ASIs should have enough competence for that (if our world is not excessively “vulnerable” (after Nick Bostrom), if they are willing, if the initial setup is not too unlucky, so not unconditionally, but at least they might be able to handle this).