Credit goes to Forrest :) All technical argumentation in this post I learned from Forrest, and translated to hopefully be somewhat more intuitively understandable.
The key claim, as far as I can make out, is that machines have different environmental needs than humans.
This is one key claim.
Add this reasoning:
Control methods being unable to conditionalise/constrain most environmental effects propagated by AGI’s interacting physical components.
That a subset of those uncontrollable effects will feed back into selecting for the continued, increased existence of components that propagated those effects.
That the artificial needs selected for (to ensure AGI’s components existence, at various levels of scale) are disjunctive from our organic needs for survival (ie. toxic and inhospitable).
if the robots decide to make it one big foundry. But where’s the logical necessity of such an outcome, that we were promised? For one thing, the machines have the rest of the solar system to work with…
Here you did not quite latch onto the arguments yet.
Robots deciding to make X is about explicit planning. Substrate-needs convergence is about implicit and usually non-internally-tracked effects of the physical components actually interacting with the outside world.
Please see this paragraph:
the physical needs of machines tell us more about their long-run tendencies, than whatever purposes they may be pursuing in the short term
This is true, regarding what current components of AI infrastructure are directed toward in their effects over the short term.
What I presume we both care about is the safety of AGI over the long term. There, any short-term ephemeral behaviour by AGI (that we tried to pre-program/pre-control for) does not matter.
What matters is what behaviour, as physically manifested in the outside world, gets selected for. And whether error correction (a more narrow form of selection) can counteract the selection for any increasingly harmful behaviour.
Now, I have reasons to disagree with the claim that machines, fully unleashed, necessarily wipe out biological life.
The reasoning you gave here is not sound in their premises, unfortunately. I would love to be able to agree with you, and find out that any AGI that persists won’t necessarily lead to the death of all humans and other current life on earth.
Given the stakes, I need to be extra careful in reasoning about this. We don’t want to end up in a ‘Don’t Look Up’ scenario (of scientists mistakenly arguing that there is a way to keep the threat contained and derive the benefits for humanity).
Let me try to specifically clarify:
As I already pointed out, they don’t need to stay on Earth.
This is like saying that a population of invasive species in Australia, can also decide to all leave and move over to another island.
When we have this population of components (variants), selected for to reproduce in partly symbiotic interactions (with surrounding artificial infrastructure; not with humans), this is not a matter of the population all deciding something.
For that, some kind of top-down coordinating mechanisms would actually have to be selected throughout the population for the population to coherently elect to all leave planet Earth – by investing resources in all the infrastructure required to fly off and set up a self-sustaining colony on another planet.
Such coordinating mechanisms are not available at the population level. Sub-populations can and will be selected for to not go on that more resource-intensive and reproductive-fitness-decreasing path.
Within the futurist circles that emerged from transhumanism, we already have a slightly different perspective, that I associate with Robin Hanson—the idea that economics will affect the structure of posthuman society, far more than the agenda of any individual AI. This ecologically-inspired perspective is reaching even lower, and saying, computers don’t even eat or breathe, they are detached from all the cycles of life in which we are embedded. They are the product of an emergent new ecology, of factories and nonbiological chemistries and energy sources, and the natural destiny of that machine ecology is to displace the old biological ecology, just as aerobic life is believed to have wiped out most of the anaerobic ecosystem that existed before it.
Yes, this summarises the differences well.
Robin Hanson’s arguments (about a market of human brain scans emulated within hardware) focus on how the more economically-efficient and faster replicatable machine ‘ems’ come to dominate and replace the market of organic humans. Forrest considers this too.
Forrest’s arguments also consider the massive reduction here of functional complexity of physical components constituting humans. For starters, the ‘ems’ would not approximate being ‘human’ in terms of their feelings and capacity to feel. Consider that how emotions are directed throughout the human body starts at the microscopic level of hormone molecules, etc, functioning differently depending on their embedded physical context. Or consider how, at a higher level of scale, botox injection into facial muscles disrupts the feedback processes that enable eg. an middle-aged woman to express emotion and relate with feelings of loved ones.
Forrest further argues that such a self-sustaining market of ems (an instance/example of self-sufficient learning machinery) would converge on their artificial needs. While Hanson concludes that the organic humans who originally invested in the ‘ems’ would gain wealth and prosper, Forrest’s more comprehensive arguments conclude that machinery across this decoupled economy will evolve to no longer exchange resources with the original humans – and in effect modify the planetary environment such that the original humans can no longer survive.
From a biophysical perspective, some kind of symbiosis is also conceivable; it’s happened before in evolution.
This is a subtle equivocation. Past problems are not necessarily representative of future problems. Past organic lifeforms forming symbiotic relationships with other organic lifeforms does not correspond with whether and how organic lifeforms would come to form, in parallel evolutionary selection, resource-exchanging relationships with artificial lifeforms.
Take into account:
Artificial lifeforms would outperform us in terms of physical, intellectual, and re-production labour. This is the whole point of companies currently using AI to take over economic production, and of increasingly autonomous AI taking over the planet. Artificial lifeforms would be more efficient at performing the functions needed to fulfill their artificial needs, than it would be for those artificial lifeforms to fulfill those needs in mutually-supportive resource exchanges with organic lifeforms.
On what, if any, basis would humans be of enough use to the artificial lifeforms, for the artificial lifeforms to be selected for keeping us around?
The benefits to the humans are clear, but canwe offer benefits to the artificial lifeforms, to a degree sufficient for the artificial lifeforms to form mutualist (ie. long-term symbiotic) relationships with us?
Artificial needs diverge significantly (across measurable dimensions or otherwise) from organic needs. So when you claim that symbiosis is possible, you also need to clarify why artificial lifeforms would come to cross the chasm from fulfilling their own artificial needs (within their new separate ecology) to also simultaneously realising the disparate needs of organic lifeforms.
How would that be Pareto optimal?
Why would AGI converge on such state any time before converging on causing our extinction?
Instead of AGI continuing to be integrated into, and sustaining of, our human economy and broader carbon-based ecosystem, there will be a decoupling.
Machines will decouple into a separate machine-dominated economy. As human labour get automated and humans get removed from market exchanges, humans get pushed out of the loop.
Machines will also decouple into their own ecosystem. Components of self-sufficient learning machinery will co-evolve to produce surrounding environmental conditions are sustaining of each others’ existence – forming regions that are simply uninhabitable by humans and other branches of current carbon lifeforms. You already aptly explained this point above.
And the argument that superintelligence just couldn’t stick with a human-friendly value system, if we managed to find one and inculcate it, hasn’t really been made here.
Please see this paragraph.
Then, refer back to point 1-3 above.
but declaring the logical inevitability of it
This post is not about making a declaration. It’s about the reasoning from premises, to a derived conclusion.
Your comment describes some of the premises and argument steps I summarised – and then mixes in your own stated intuitions and thoughts.
If you want to explore your own ideas, that’s fine!
If you want to follow reasoning in this post, I need you to check whether your paraphrases cover (correspond with) the stated premises and argument steps.
Address the stated premises, to verify whether those premises are empirically sound.
Address the stated reasoning, to verify whether those reasoning steps are logically consistent.
As an analogy, say a mathematician writes out their axioms and logic on a chalkboard. What if onlooking colleagues jumped in and wiped out some of the axioms and reasoning steps? And in the wiped-out spots, they jotted down their own axioms (irrelevant to the original stated problem) and their short bursts of reasoning (not logically derived from the original premises)?
Would that help colleagues to understand and verify new formal reasoning?
What if they then turn around and confidently state that they now understand the researcher’s argument – and that it’s a valuable one, but that the “claim” of logical inevitability weakens it?
Would you value that colleagues in your field discuss your arguments this way? Would you stick around in such a culture?
For the moment, let me just ask one question: why is it that toilet training a human infant is possible, but convincing a superintelligent machine civilization to stay off the Earth is not possible? Can you explain this in terms of “controllability limits” and your other concepts?
^— Anyone reading that question, suggest thinking first why those two cases cannot be equivocated.
Here are my responses:
An infant is dependent on their human instructors for survival, and also therefore has been “selected for” over time to listen to adult instructions. AGI would be decidedly not dependent on our survival, so there is no reason for AGI to be selected for to follow our instructions.
Rather, that would heavily restrict AGI’s ability to function in the varied ways that maintain/increase their survival and reproduction rate (rather than act in the ways we humans want because it’s safe and beneficial to us). So accurately following human instructions would be strongly selected against in the run up to AGI coming into existence.
That is, over much shorter periods (years) than human genes would be selected for, for a number of reasons, some of which you can find back in the footnotes.
As parents can attest – even where infants manage to follow use-the-potty instructions (after many patient attempts) – an infant’s behaviour is still actually not controllable for the most part. The child makes their own choices and does plenty of things their adult overseers wouldn’t want them to do.
But the infant probably won’t do any super-harmful things to surrounding family/community/citizens.
Not only because they lack the capacity to (unlike AGI). But also because those harms to surrounding others would in turn tend to negatively affect themselves (including through social punishment) – and their ancestors were selected for to not do that when they were kids. On the other hand, AGI doing super-harmful things to human beings, including just by sticking around and toxifying the place, does not in turn commensurately negatively impact the AGI.
Even where humans decide to carpet-bomb planet Earth in retaliation, using information-processing/communication infrastructure that somehow hasn’t already been taken over by and/or integrated with AGI, the impacts will hit human survival harder than AGI survival (assuming enough production/maintenance redundancy attained at that point).
Furthermore, whenever an infant does unexpected harmful stuff, the damage is localised. If they refuse instructions and pee all over the floor, that’s not the end of civilisation.
The effects of AGI doing/causing unexpected harmful-to-human stuff manifest at a global planetary scale. Those effects feed back in ways that improve AGI’s existence, but reduce ours.
A human infant is one physically bounded individual, that notably cannot modify and expand its physical existence by connecting up new parts in the ways AGI could. The child grows up over two decades to adult size, and that’s their limit.
A “superintelligent machine civilization” however involves a massive expanding population evolutionarily selected for over time.
A human infant being able to learn to potty has mildly positive effect on their (and their family’s) potential and their offspring to survive and reproduce. This because defecating or peeing in other places around the home can spread diseases. Therefore, any genes…or memes that contribute to the expressed functionality needed for learning how to use the toilet get mildly selected for.
On the other hand, for a population of AGI (which once became AGI was selected against following human instructions) to leave all the sustaining infrastructure and resources on planet Earth would have a strongly negative effect on their potential to survive and reproduce.
Amongst an entire population of human infants who are taught to use the toilet, there where always be individuals who refuse for some period, or simply are not predisposed to communicating to learn and follow that physical behaviour. Some adults still do not (choose to) use the toilet. That’s not the end of civilisation.
Amongst an entire population of mutually sustaining AGI components, even if by some magic you have not explained to me yet, some do follow human instructions and jettison off into space to start new colonies – never to return – then others (even for distributed Byzantine fault reasons) would still stick around under this scenario.
That, for even a few more decades, would be the end of human civilisation.
One thing about how the physical world works, is that in order for code to be computed, this needs to take place through a physical substrate. This is a necessary condition – inputs do not get processed into outputs through a platonic realm.
Substrate configurations in this case are, by definition, artificial – as in artificial general intelligence. This as distinct from the organic substrate configurations of humans (including human infants).
Further, the ranges of conditions needed for the artificial substate configurations to continue to exist, function and scale up over time – such as extreme temperatures, low oxygen and water, and toxic chemicals – fall outside the ranges of conditions that humans and other current organic lifeforms need to survive.
~ ~ ~
Hope that clarifies long-term-human-safety-relevant distinctions between:
building AGI (that continue to scale) and instructing them to leave Earth; and
having a child (who grows up to adult size) and instructing them to use the potty.
I see three arguments here for why AIs couldn’t or wouldn’t do, what the human child can: arguments from evolution (1, 2, 5), an argument from population (4, 6), and an argument from substrate incentives (3, 7).
The arguments from evolution are: Children have evolved to pay attention to their elders (1), to not be antisocial (2), and to be hygienic (5), whereas AIs didn’t.
The argument from population (4, 6), I think is basically just that in a big enough population of space AIs, eventually some of them would no longer keep their distance from Earth.
The argument from substrate incentives (3, 7) is complementary to the argument from population, in that it provides a motive for the AIs to come and despoil Earth.
I think the immediate crux here is whether the arguments from evolution actually imply the impossibility of aligning an individual AI. I don’t see how they imply impossibility. Yes, AIs haven’t evolved to have those features, but the point of alignment research is to give them analogous features by design. Also, AI is developing in a situation where it is dependent on human beings and constrained by human beings, and that situation does possess some analogies to natural selection.
Human beings, both individually and collectively, already provide numerous examples of how dangerous incentives can exist, but can nonetheless be resisted or discouraged. It is materially possible to have a being which resists actions that may otherwise have some appeal, and to have societies in which that resistance is maintained for generations. The robustness of that resistance is a variable thing. I suppose that most domesticated species, returned to the wild, become feral again in a few generations. On the other hand, we talk a lot about superhuman capabilities here; maybe a superhuman robustness can reduce the frequency of alignment failure to something that you would never expect to occur, even on geological timescales.
This is why, if I was arguing for a ban on AI, I would not be talking about the problem being logically unsolvable. The considerations that you are bringing up, are not of that nature. At best, they are arguments for practical unsolvability, not absolute in-principle logical unsolvability. If they were my arguments, I would say that they show making AI to be unwise, and hubristic, and so on.
Yes, AIs haven’t evolved to have those features, but the point of alignment research is to give them analogous features by design.
Agreed.
It’s unintuitive to convey this part:
In the abstract, you can picture a network topology of all possible AGI component connections (physical signal interactions). These connections span the space of greater mining/production/supply infrastructure that is maintaining of AGI functional parts. Also add in the machinery connections with the outside natural world.
Then, picture the nodes and possible connections change over time, as a result of earlier interactions with/in the network.
That network of machinery comes into existence through human engineers, etc, within various institutions selected by market forces etc, implementing blueprints as learning algorithms, hardware set-ups, etc, and tinkering with those until they work.
The question is whether before that network of machinery becomes self-sufficient in their operations, the human engineers, etc, can actually build in constraints into the configured designs, in such a way that once self-modifying (in learning new code and producing new hardware configurations), the changing machinery components are constrained in their propagated effects across their changing potential signal connections over time, such that component-propagated effects do not end up feeding back in ways that (subtly, increasingly) increase the maintained and replicated existence of those configured components in the network.
Human beings, both individually and collectively, already provide numerous examples of how dangerous incentives can exist, but can nonetheless be resisted or discouraged.
Humans are not AGI. And there are ways AGI would be categorically unlike humans that are crucial to the question of whether it is possible for AGI to stay safe to humans over the long term.
Therefore, you cannot swap out “humans” with “AGI” in your reasoning by historical analogy above, and expect your reasoning to stay sound. This is an equivocation.
Please see point 7 above.
The argument from substrate incentives (3, 7) is complementary to the argument from population, in that it provides a motive for the AIs to come and despoil Earth.
Maybe it’s here you are not tracking the arguments.
These are not substrate “incentives”, nor do they provide a “motive”.
Small dinosaurs with hair-like projections on their front legs did not have an “incentive” to co-opt the changing functionality of those hair-like projections into feather-like projections for gliding and then for flying. Nor were they provided a “motive” with respect to which they were directed in their internal planning toward growing those feather-like projections.
That would make the mistake of presuming evolutionary teleology – that there is some complete set of pre-defined or predefinable goals that the lifeform is evolving toward.
I’m deliberate in my choice of words when I write “substrate needs”.
At best, they are arguments for practical unsolvability, not absolute in-principle logical unsolvability. If they were my arguments, I would say that they show making AI to be unwise, and hubristic, and so on.
Practical unsolvability would also be enough justification to do everything we can do now to restrict corporate AI development.
I assume you care about this problem, otherwise you wouldn’t be here :) Any ideas / initiatives you are considering to try robustly work with others to restrict further AI development?
The recurring argument seems to be, that it would be adaptive for machines to take over Earth and use it to make more machine parts, and so eventually it will happen, no matter how Earth-friendly their initial values are.
So now my question is, why are there still cows in India? And more than that, why has the dominant religion of India never evolved so as to allow for cows to be eaten, even in a managed way, but instead continues to regard them as sacred?
I’m not sure how we got on to the subject, but there is an economic explanation for the sacred cow: a family that does not own enough land to graze a cow can still own one, allowing it to wander and graze on other people’s land, so it’s a form of social welfare.
Remmelt argues that no matter how friendly or aligned the first AIs are, simple evolutionary pressure will eventually lead some of their descendants to destroy the biosphere, in order to make new parts and create new habitats for themselves.
I proposed the situation of cattle in India, as a counterexample to this line of thought. They could be used for meat, but the Hindu majority has never accepted that. It’s meant to be an example of successful collective self-restraint by a more intelligent species.
In my experience, jumping between counterexamples drawn from current society does not really contribute to inquiry here. Such counterexamples tend to not account for essential parts of the argument that must be reasoned through together. The argument is about self-sufficient learning machinery (not about sacred cows or teaching children).
It would be valuable for me if you could go though the argumentation step-by-step and tell me where a premise seems unsound or there seems to be a reasoning gap.
Now, onto your points.
the first AIs
To reduce ambiguity, suggest replacing with
“the first self-sufficient learning machinery”.
simple evolutionary pressure will eventually lead
The mechanism of evolution is simple.
However, evolutionary pressure is complex.
Be careful not to equivocate the two. That would be like saying you could predict everything about what a stochastic gradient descent algorithm will select for across parameters selected on the basis of inputs everywhere from the environment.
lead some of their descendants to destroy the biosphere in order to make new parts and create new habitats for themselves.
This part is overall a great paraphrase.
One nitpick: notice how “in order to” either implies or slips in explicit intentionality again. Going by this podcast, Elizabeth Anscombe’s philosophy of intentions described intentions as chains of “in order to” reasoning.
I proposed the situation of cattle in India, as a counterexample to this line of thought.
Regarding sacred cows in India, this sounds neat, but it does not serve as a counterargument. We need to think about evolutionary timelines for organic human lifeforms over millions of years, and Hinduism is ~4000 years old. Also, cows share a mammal ancestor with us, evolving on the basis of the same molecular substrates. Whatever environmental conditions/contexts we
humans need, cows almost completely need too.
Crucially however humans evolve to change and maintain environmental conditions also tends to correspond with what conditions cows need (however, human tribes have not been evolutionarily selected for to deal with issues at the scale of eg. climate change). That would not be the case for self-sufficient learning machinery.
Crucially there is a basis for symbiotic relationships of exchange that benefit both the reproduction of cows and humans. That would not be the case between self-sufficient learning machinery and humans.
There is some basis for humans as social mammals to relate with cows. Furthermore, religious cultural memes that sprouted out over a few thousand years also don’t have to be evolutionarily optimal across the board for the reproduction of their hosts (even as religious symbols like of cows do increase that by enabling humans to act collectively). Still, people milk cows in India, and some slaughter and/or export cows there as well. But when humans eat meat, they don’t keep growing beyond adult size. Conversely, some self-sufficient learning machinery sub-population that extract from our society/ecosystem at the cost of our lives can keep doing so to keep scaling in their constituent components (with shifting boundaries of interaction and mutual reproduction).
There is no basis for selection for the expression of collective self-restraint in self-sufficient learning machinery as you describe. Even if there was such a basis, hypothetically, collective self-restraint would need to occur at virtually 100% rates across the population of self-sufficient learning machinery to not end up leading to the deaths of all humans.
~ ~ ~
Again, I find quick dismissive counterexamples unhelpful for digging into the arguments. I have had dozens of conversations on substrate-needs convergence. In the conversations where my conversation partner jumped between quick counterarguments, almost none were prepared to dig into the actual arguments. Hope you understand why I won’t respond to another counterexample.
Hello again. To expedite this discussion, let me first state my overall position on AI. I think AI has general intelligence right now, and that has unfolding consequences that are both good and bad; but AI is going to have superintelligence soon, and that makes “superalignment” the most consequential problem in the world, though perhaps it won’t be solved in time (or will be solved incorrectly), in which case we get to experience what partly or wholly unaligned superintelligence is like.
Your position is that even if today’s AI could be given bio-friendly values, AI would still be the doom of biological life in the longer run, because (skipping a lot of details) machine life and biological life have incompatible physical needs, and once machine life exists, darwinian processes will eventually produce machine life that overruns the natural biosphere. (You call this “substrate-needs convergence”: the pressure from substrate needs will darwinistically reward machine life that does invade natural biospheres, so eventually such machine life will be dominant, regardless of the initial machine population.)
I think it would be great if a general eco-evo-devo perspective, on AI, the “fourth industrial revolution”, etc, took off and became sophisticated and multifarious. That would be an intellectual advance. But I see no guarantee that it would end up agreeing with you, on facts or on values.
For example, I think some of the “effective accelerationists” would actually agree with your extrapolation. But they see it as natural and inevitable, or even as a good thing because it’s the next step in evolution, or they have a survivalist attitude of “if you can’t beat the machines, join them”. Though the version of e/acc that is most compatible with human opinion, might be a mixture of economic and ecological thinking: AI creates wealth, greater wealth makes it easier to protect the natural world, and meanwhile evolution will also favor the rich complexity of biological-mechanical symbiosis, over the poorer ecologies of an all-biological or all-mechanical world. Something like that.
For my part, I agree that pressure from substrate needs is real, but I’m not at all convinced that it must win against all countervailing pressures. That’s the point of my proposed “counterexamples”. An individual AI can have an anti-pollution instinct (that’s the toilet training analogy), an AI civilization can have an anti-exploitation culture (that’s the sacred cow analogy). Can’t such an instinct and such a culture resist the pressure from substrate needs, if the AIs value and protect them enough? I do not believe that substrate-needs convergence is inevitable, any more than I believe that pro-growth culture is inevitable among humans. I think your arguments are underestimating what a difference intelligence makes to possible ecological and evolutionary dynamics (and I think superintelligence makes even aeon-long highly artificial stabilizations conceivable—e.g. by the classic engineering method of massively redundant safeguards that all have to fail at once, for something to go wrong).
By the way, since you were last here, we had someone show up (@spiritus-dei) making almost the exact opposite of your arguments: AI won’t ever choose to kill us because, in its current childhood stage, it is materially dependent on us (e.g. for electricity), and then, in its mature and independent form, it will be even better at empathy and compassion than humans are. A dialectical clash between the two of you could be very edifying.
Your position is that even if today’s AI could be given bio-friendly values, AI would still be the doom of biological life in the longer run, because (skipping a lot of details) machine life and biological life have incompatible physical needs, and once machine life exists, darwinian processes will eventually produce machine life that overruns the natural biosphere. (You call this “substrate-needs convergence”
For my part, I agree that pressure from substrate needs is real
Thanks for clarifying your position here.
Can’t such an instinct and such a culture resist the pressure from substrate needs, if the AIs value and protect them enough?
No, unfortunately not. To understand why, you would need to understand how “intelligent” processes that necessarily involve the use of measurement and abstraction cannot conditionalise the space of possible interactions between machine components and connected surroundings – sufficiently, to not feed back into causing environmental effects that feed back into the continued or re-assembled existence of the components.
I think your arguments are underestimating what a difference intelligence makes to possible ecological and evolutionary dynamics
I have thought about this, and I know my mentor Forrest has thought about this a lot more.
For learning machinery that re-produce their own components, you will get evolutionary dynamics across the space of interactions that can feed back into the machinery’s assembled existence.
Intelligence has limitations as an internal pattern-transforming process, in that it cannot track nor conditionalise all the outside evolutionary feedback.
Code does not intrinsically know how it got selected for. But code selected through some intelligent learning process can and would get evolutionarily exapted for different functional ends.
Notably, the more information-processing capacity, the more components that information-processing runs through, and the more components that can get evolutionarily selected for.
In this, I am not underestimating the difference that “general intelligence” – as transforming patterns across domains – would make here. Intelligence in machinery that store, copy and distribute code at high-fidelity would greatly amplify evolutionary processes.
I suggest clarifying what you specifically mean with “what a difference intelligence makes”. This so intelligence does not become a kind of “magic” – operating independently of all other processes, capable of obviating all obstacles, including those that result from its being.
superintelligence makes even aeon-long highly artificial stabilizations conceivable—e.g. by the classic engineering method of massively redundant safeguards that all have to fail at once, for something to go wrong
We need to clarify the scope of application of this classic engineering method. Massive redundancy works for complicated systems (like software in aeronautics) under stable enough conditions. There is clarity there around what needs to be kept safe and how it can be kept safe (what needs to error detected and corrected for).
Unfortunately, the problem with “AGI” is that the code and hardware would keep getting reconfigured to function in new complex ways that cannot be contained by the original safeguards. That applies even to learning – the point is to internally integrate patterns from the outside world that were not understood before. So how are you going to have learning machinery anticipate how they will come to function differently once they learned patterns they do not understand / are unable to express yet?
we had someone show up (@spiritus-dei) making almost the exact opposite of your arguments: AI won’t ever choose to kill us because, in its current childhood stage, it is materially dependent on us (e.g. for electricity), and then, in its mature and independent form, it will be even better at empathy and compassion than humans are.
Interesting. The second part seems like a claim some people in E/Accel would make.
The response is not that complicated: once the AI is no longer materially dependent on us, there are no longer dynamics of exchange there that would ensure they choose not to kill us. And the author seems to be confusing what lies at the basis of caring for oneself and others – coming to care for involves self-referential dynamics being selected for.
OK, I’ll be paraphrasing your position again, I trust that you will step in, if I’ve missed something.
Your key statements are something like
Every autopoietic control system is necessarily overwhelmed by evolutionary feedback.
and
No self-modifying learning system can guarantee anything about its future decision-making process.
But I just don’t see the argument for impossibility. In both cases, you have an intelligent system (or a society of them) trying to model and manage something. Whether or not it can succeed, seems to me just contingent. For some minds in some worlds, such problems will be tractable, for others, not.
I think without question we could exhibit toy worlds where those statements are not true. What is it about our real world that would make those problems intractable for all possible “minds”, no matter how good their control theory, and their ability to monitor and intervene in the world?
no matter how good their control theory, and their ability to monitor and intervene in the world?
This. There are fundamental limits to what system-propagated effects the system can control. And the portion of own effects the system can control decreases as the system scales in component complexity.
Yet, any of those effects that feed back into the continued/increased existence of components get selected for.
So there is a fundamental inequality here. No matter how “intelligent” the system is at pattern-transformation internally, it cannot intervene on all but a tiny portion of (possible) external evolutionary feedback on its constituent components.
They wrote back that Mitchell’s comments cleared up a lot of their confusion. They also thought that the assertion that evolutionary pressures will overwhelm any efforts at control seems more asserted than proven.
Here is a longer explanation I gave on why there would be a fundamental inequality:
There is a fundamental inequality. Control works through feedback. Evolution works through feedback. But evolution works across a much larger space of effects than can be controlled for.
Control involves a feedback loop of correction back to detection. Control feedback loops are limited in terms of their capacity to force states in the environment to a certain knowable-to-be-safe subset, because sensing and actuating signals are limited and any computational processing of signals done in between (as modelling, simulating and evaluating outcome effects) is limited.
Evolution also involves a feedback loop, of whatever propagated environmental effects feed back to be maintaining and/or replicating of the originating components’ configurations. But for evolution, the feedback works across the entire span of physical effects propagating between the machinery’s components and the rest of the environment.
Evolution works across a much much larger space of possible degrees and directivity in effects than the space of effects that could be conditionalised (ie. forced toward a subset of states) by the machinery’s control signals.
Meaning evolution cannot be adequately controlled for the machinery not to converge on environmental effects that are/were needed for their (increased) artificial existence, but fall outside the environmental ranges we fragile organic humans could survive under.
If you want to argue against this, you would need to first show that changing forces of evolutionary selection convergent on human-unsafe-effects exhibit a low enough complexity to actually be sufficiently modellable, simulatable and evaluatable inside the machinery’s hardware itself.
Only then could the machinery hypothetically have the capacity to (mitigate and/or) correct harmful evolutionary selection — counteract all that back toward allowable effects/states of the environment.
Another way of considering your question is to ask why we humans cannot instruct all humans to stop contributing to climate change now/soon like we can instruct an infant to use the toilet.
The disparity is stronger than that and actually unassailable, given market and ecosystem decoupling for AGI (ie. no communication bridges), and the increasing resource extraction and environmental toxification by AGI over time.
Thanks for digging into some of the reasoning!
Credit goes to Forrest :) All technical argumentation in this post I learned from Forrest, and translated to hopefully be somewhat more intuitively understandable.
This is one key claim.
Add this reasoning:
Control methods being unable to conditionalise/constrain most environmental effects propagated by AGI’s interacting physical components.
That a subset of those uncontrollable effects will feed back into selecting for the continued, increased existence of components that propagated those effects.
That the artificial needs selected for (to ensure AGI’s components existence, at various levels of scale) are disjunctive from our organic needs for survival (ie. toxic and inhospitable).
Here you did not quite latch onto the arguments yet.
Robots deciding to make X is about explicit planning.
Substrate-needs convergence is about implicit and usually non-internally-tracked effects of the physical components actually interacting with the outside world.
Please see this paragraph:
This is true, regarding what current components of AI infrastructure are directed toward in their effects over the short term.
What I presume we both care about is the safety of AGI over the long term. There, any short-term ephemeral behaviour by AGI (that we tried to pre-program/pre-control for) does not matter.
What matters is what behaviour, as physically manifested in the outside world, gets selected for. And whether error correction (a more narrow form of selection) can counteract the selection for any increasingly harmful behaviour.
The reasoning you gave here is not sound in their premises, unfortunately.
I would love to be able to agree with you, and find out that any AGI that persists won’t necessarily lead to the death of all humans and other current life on earth.
Given the stakes, I need to be extra careful in reasoning about this.
We don’t want to end up in a ‘Don’t Look Up’ scenario (of scientists mistakenly arguing that there is a way to keep the threat contained and derive the benefits for humanity).
Let me try to specifically clarify:
This is like saying that a population of invasive species in Australia, can also decide to all leave and move over to another island.
When we have this population of components (variants), selected for to reproduce in partly symbiotic interactions (with surrounding artificial infrastructure; not with humans), this is not a matter of the population all deciding something.
For that, some kind of top-down coordinating mechanisms would actually have to be selected throughout the population for the population to coherently elect to all leave planet Earth – by investing resources in all the infrastructure required to fly off and set up a self-sustaining colony on another planet.
Such coordinating mechanisms are not available at the population level.
Sub-populations can and will be selected for to not go on that more resource-intensive and reproductive-fitness-decreasing path.
Yes, this summarises the differences well.
Robin Hanson’s arguments (about a market of human brain scans emulated within hardware) focus on how the more economically-efficient and faster replicatable machine ‘ems’ come to dominate and replace the market of organic humans. Forrest considers this too.
Forrest’s arguments also consider the massive reduction here of functional complexity of physical components constituting humans. For starters, the ‘ems’ would not approximate being ‘human’ in terms of their feelings and capacity to feel. Consider that how emotions are directed throughout the human body starts at the microscopic level of hormone molecules, etc, functioning differently depending on their embedded physical context. Or consider how, at a higher level of scale, botox injection into facial muscles disrupts the feedback processes that enable eg. an middle-aged woman to express emotion and relate with feelings of loved ones.
Forrest further argues that such a self-sustaining market of ems (an instance/example of self-sufficient learning machinery) would converge on their artificial needs. While Hanson concludes that the organic humans who originally invested in the ‘ems’ would gain wealth and prosper, Forrest’s more comprehensive arguments conclude that machinery across this decoupled economy will evolve to no longer exchange resources with the original humans – and in effect modify the planetary environment such that the original humans can no longer survive.
This is a subtle equivocation.
Past problems are not necessarily representative of future problems.
Past organic lifeforms forming symbiotic relationships with other organic lifeforms does not correspond with whether and how organic lifeforms would come to form, in parallel evolutionary selection, resource-exchanging relationships with artificial lifeforms.
Take into account:
Artificial lifeforms would outperform us in terms of physical, intellectual, and re-production labour. This is the whole point of companies currently using AI to take over economic production, and of increasingly autonomous AI taking over the planet. Artificial lifeforms would be more efficient at performing the functions needed to fulfill their artificial needs, than it would be for those artificial lifeforms to fulfill those needs in mutually-supportive resource exchanges with organic lifeforms.
On what, if any, basis would humans be of enough use to the artificial lifeforms, for the artificial lifeforms to be selected for keeping us around?
The benefits to the humans are clear, but can we offer benefits to the artificial lifeforms, to a degree sufficient for the artificial lifeforms to form mutualist (ie. long-term symbiotic) relationships with us?
Artificial needs diverge significantly (across measurable dimensions or otherwise) from organic needs. So when you claim that symbiosis is possible, you also need to clarify why artificial lifeforms would come to cross the chasm from fulfilling their own artificial needs (within their new separate ecology) to also simultaneously realising the disparate needs of organic lifeforms.
How would that be Pareto optimal?
Why would AGI converge on such state any time before converging on causing our extinction?
Instead of AGI continuing to be integrated into, and sustaining of, our human economy and broader carbon-based ecosystem, there will be a decoupling.
Machines will decouple into a separate machine-dominated economy. As human labour get automated and humans get removed from market exchanges, humans get pushed out of the loop.
Machines will also decouple into their own ecosystem. Components of self-sufficient learning machinery will co-evolve to produce surrounding environmental conditions are sustaining of each others’ existence – forming regions that are simply uninhabitable by humans and other branches of current carbon lifeforms. You already aptly explained this point above.
Please see this paragraph.
This post is not about making a declaration. It’s about the reasoning from premises, to a derived conclusion.
Your comment describes some of the premises and argument steps I summarised – and then mixes in your own stated intuitions and thoughts.
If you want to explore your own ideas, that’s fine!
If you want to follow reasoning in this post, I need you to check whether your paraphrases cover (correspond with) the stated premises and argument steps.
Address the stated premises, to verify whether those premises are empirically sound.
Address the stated reasoning, to verify whether those reasoning steps are logically consistent.
As an analogy, say a mathematician writes out their axioms and logic on a chalkboard.
What if onlooking colleagues jumped in and wiped out some of the axioms and reasoning steps? And in the wiped-out spots, they jotted down their own axioms (irrelevant to the original stated problem) and their short bursts of reasoning (not logically derived from the original premises)?
Would that help colleagues to understand and verify new formal reasoning?
What if they then turn around and confidently state that they now understand the researcher’s argument – and that it’s a valuable one, but that the “claim” of logical inevitability weakens it?
Would you value that colleagues in your field discuss your arguments this way?
Would you stick around in such a culture?
For the moment, let me just ask one question: why is it that toilet training a human infant is possible, but convincing a superintelligent machine civilization to stay off the Earth is not possible? Can you explain this in terms of “controllability limits” and your other concepts?
^— Anyone reading that question, suggest thinking first why those two cases cannot be equivocated.
Here are my responses:
An infant is dependent on their human instructors for survival, and also therefore has been “selected for” over time to listen to adult instructions. AGI would be decidedly not dependent on our survival, so there is no reason for AGI to be selected for to follow our instructions.
Rather, that would heavily restrict AGI’s ability to function in the varied ways that maintain/increase their survival and reproduction rate (rather than act in the ways we humans want because it’s safe and beneficial to us). So accurately following human instructions would be strongly selected against in the run up to AGI coming into existence.
That is, over much shorter periods (years) than human genes would be selected for, for a number of reasons, some of which you can find back in the footnotes.
As parents can attest – even where infants manage to follow use-the-potty instructions (after many patient attempts) – an infant’s behaviour is still actually not controllable for the most part. The child makes their own choices and does plenty of things their adult overseers wouldn’t want them to do.
But the infant probably won’t do any super-harmful things to surrounding family/community/citizens.
Not only because they lack the capacity to (unlike AGI). But also because those harms to surrounding others would in turn tend to negatively affect themselves (including through social punishment) – and their ancestors were selected for to not do that when they were kids. On the other hand, AGI doing super-harmful things to human beings, including just by sticking around and toxifying the place, does not in turn commensurately negatively impact the AGI.
Even where humans decide to carpet-bomb planet Earth in retaliation, using information-processing/communication infrastructure that somehow hasn’t already been taken over by and/or integrated with AGI, the impacts will hit human survival harder than AGI survival (assuming enough production/maintenance redundancy attained at that point).
Furthermore, whenever an infant does unexpected harmful stuff, the damage is localised. If they refuse instructions and pee all over the floor, that’s not the end of civilisation.
The effects of AGI doing/causing unexpected harmful-to-human stuff manifest at a global planetary scale. Those effects feed back in ways that improve AGI’s existence, but reduce ours.
A human infant is one physically bounded individual, that notably cannot modify and expand its physical existence by connecting up new parts in the ways AGI could. The child grows up over two decades to adult size, and that’s their limit.
A “superintelligent machine civilization” however involves a massive expanding population evolutionarily selected for over time.
A human infant being able to learn to potty has mildly positive effect on their (and their family’s) potential and their offspring to survive and reproduce. This because defecating or peeing in other places around the home can spread diseases. Therefore, any genes…or memes that contribute to the expressed functionality needed for learning how to use the toilet get mildly selected for.
On the other hand, for a population of AGI (which once became AGI was selected against following human instructions) to leave all the sustaining infrastructure and resources on planet Earth would have a strongly negative effect on their potential to survive and reproduce.
Amongst an entire population of human infants who are taught to use the toilet, there where always be individuals who refuse for some period, or simply are not predisposed to communicating to learn and follow that physical behaviour. Some adults still do not (choose to) use the toilet. That’s not the end of civilisation.
Amongst an entire population of mutually sustaining AGI components, even if by some magic you have not explained to me yet, some do follow human instructions and jettison off into space to start new colonies – never to return – then others (even for distributed Byzantine fault reasons) would still stick around under this scenario. That, for even a few more decades, would be the end of human civilisation.
One thing about how the physical world works, is that in order for code to be computed, this needs to take place through a physical substrate. This is a necessary condition – inputs do not get processed into outputs through a platonic realm.
Substrate configurations in this case are, by definition, artificial – as in artificial general intelligence. This as distinct from the organic substrate configurations of humans (including human infants).
Further, the ranges of conditions needed for the artificial substate configurations to continue to exist, function and scale up over time – such as extreme temperatures, low oxygen and water, and toxic chemicals – fall outside the ranges of conditions that humans and other current organic lifeforms need to survive.
~ ~ ~
Hope that clarifies long-term-human-safety-relevant distinctions between:
building AGI (that continue to scale) and instructing them to leave Earth; and
having a child (who grows up to adult size) and instructing them to use the potty.
I see three arguments here for why AIs couldn’t or wouldn’t do, what the human child can: arguments from evolution (1, 2, 5), an argument from population (4, 6), and an argument from substrate incentives (3, 7).
The arguments from evolution are: Children have evolved to pay attention to their elders (1), to not be antisocial (2), and to be hygienic (5), whereas AIs didn’t.
The argument from population (4, 6), I think is basically just that in a big enough population of space AIs, eventually some of them would no longer keep their distance from Earth.
The argument from substrate incentives (3, 7) is complementary to the argument from population, in that it provides a motive for the AIs to come and despoil Earth.
I think the immediate crux here is whether the arguments from evolution actually imply the impossibility of aligning an individual AI. I don’t see how they imply impossibility. Yes, AIs haven’t evolved to have those features, but the point of alignment research is to give them analogous features by design. Also, AI is developing in a situation where it is dependent on human beings and constrained by human beings, and that situation does possess some analogies to natural selection.
Human beings, both individually and collectively, already provide numerous examples of how dangerous incentives can exist, but can nonetheless be resisted or discouraged. It is materially possible to have a being which resists actions that may otherwise have some appeal, and to have societies in which that resistance is maintained for generations. The robustness of that resistance is a variable thing. I suppose that most domesticated species, returned to the wild, become feral again in a few generations. On the other hand, we talk a lot about superhuman capabilities here; maybe a superhuman robustness can reduce the frequency of alignment failure to something that you would never expect to occur, even on geological timescales.
This is why, if I was arguing for a ban on AI, I would not be talking about the problem being logically unsolvable. The considerations that you are bringing up, are not of that nature. At best, they are arguments for practical unsolvability, not absolute in-principle logical unsolvability. If they were my arguments, I would say that they show making AI to be unwise, and hubristic, and so on.
Agreed.
It’s unintuitive to convey this part:
In the abstract, you can picture a network topology of all possible AGI component connections (physical signal interactions). These connections span the space of greater mining/production/supply infrastructure that is maintaining of AGI functional parts. Also add in the machinery connections with the outside natural world.
Then, picture the nodes and possible connections change over time, as a result of earlier interactions with/in the network.
That network of machinery comes into existence through human engineers, etc, within various institutions selected by market forces etc, implementing blueprints as learning algorithms, hardware set-ups, etc, and tinkering with those until they work.
The question is whether before that network of machinery becomes self-sufficient in their operations, the human engineers, etc, can actually build in constraints into the configured designs, in such a way that once self-modifying (in learning new code and producing new hardware configurations), the changing machinery components are constrained in their propagated effects across their changing potential signal connections over time, such that component-propagated effects do not end up feeding back in ways that (subtly, increasingly) increase the maintained and replicated existence of those configured components in the network.
Humans are not AGI. And there are ways AGI would be categorically unlike humans that are crucial to the question of whether it is possible for AGI to stay safe to humans over the long term.
Therefore, you cannot swap out “humans” with “AGI” in your reasoning by historical analogy above, and expect your reasoning to stay sound. This is an equivocation.
Please see point 7 above.
Maybe it’s here you are not tracking the arguments.
These are not substrate “incentives”, nor do they provide a “motive”.
Small dinosaurs with hair-like projections on their front legs did not have an “incentive” to co-opt the changing functionality of those hair-like projections into feather-like projections for gliding and then for flying. Nor were they provided a “motive” with respect to which they were directed in their internal planning toward growing those feather-like projections.
That would make the mistake of presuming evolutionary teleology – that there is some complete set of pre-defined or predefinable goals that the lifeform is evolving toward.
I’m deliberate in my choice of words when I write “substrate needs”.
Practical unsolvability would also be enough justification to do everything we can do now to restrict corporate AI development.
I assume you care about this problem, otherwise you wouldn’t be here :) Any ideas / initiatives you are considering to try robustly work with others to restrict further AI development?
The recurring argument seems to be, that it would be adaptive for machines to take over Earth and use it to make more machine parts, and so eventually it will happen, no matter how Earth-friendly their initial values are.
So now my question is, why are there still cows in India? And more than that, why has the dominant religion of India never evolved so as to allow for cows to be eaten, even in a managed way, but instead continues to regard them as sacred?
I’ll respond in the next reply.
I’m not sure how we got on to the subject, but there is an economic explanation for the sacred cow: a family that does not own enough land to graze a cow can still own one, allowing it to wander and graze on other people’s land, so it’s a form of social welfare.
Remmelt argues that no matter how friendly or aligned the first AIs are, simple evolutionary pressure will eventually lead some of their descendants to destroy the biosphere, in order to make new parts and create new habitats for themselves.
I proposed the situation of cattle in India, as a counterexample to this line of thought. They could be used for meat, but the Hindu majority has never accepted that. It’s meant to be an example of successful collective self-restraint by a more intelligent species.
In my experience, jumping between counterexamples drawn from current society does not really contribute to inquiry here. Such counterexamples tend to not account for essential parts of the argument that must be reasoned through together. The argument is about self-sufficient learning machinery (not about sacred cows or teaching children).
It would be valuable for me if you could go though the argumentation step-by-step and tell me where a premise seems unsound or there seems to be a reasoning gap.
Now, onto your points.
To reduce ambiguity, suggest replacing with “the first self-sufficient learning machinery”.
The mechanism of evolution is simple. However, evolutionary pressure is complex.
Be careful not to equivocate the two. That would be like saying you could predict everything about what a stochastic gradient descent algorithm will select for across parameters selected on the basis of inputs everywhere from the environment.
This part is overall a great paraphrase.
One nitpick: notice how “in order to” either implies or slips in explicit intentionality again. Going by this podcast, Elizabeth Anscombe’s philosophy of intentions described intentions as chains of “in order to” reasoning.
Regarding sacred cows in India, this sounds neat, but it does not serve as a counterargument. We need to think about evolutionary timelines for organic human lifeforms over millions of years, and Hinduism is ~4000 years old. Also, cows share a mammal ancestor with us, evolving on the basis of the same molecular substrates. Whatever environmental conditions/contexts we humans need, cows almost completely need too.
Crucially however humans evolve to change and maintain environmental conditions also tends to correspond with what conditions cows need (however, human tribes have not been evolutionarily selected for to deal with issues at the scale of eg. climate change). That would not be the case for self-sufficient learning machinery.
Crucially there is a basis for symbiotic relationships of exchange that benefit both the reproduction of cows and humans. That would not be the case between self-sufficient learning machinery and humans.
There is some basis for humans as social mammals to relate with cows. Furthermore, religious cultural memes that sprouted out over a few thousand years also don’t have to be evolutionarily optimal across the board for the reproduction of their hosts (even as religious symbols like of cows do increase that by enabling humans to act collectively). Still, people milk cows in India, and some slaughter and/or export cows there as well. But when humans eat meat, they don’t keep growing beyond adult size. Conversely, some self-sufficient learning machinery sub-population that extract from our society/ecosystem at the cost of our lives can keep doing so to keep scaling in their constituent components (with shifting boundaries of interaction and mutual reproduction).
There is no basis for selection for the expression of collective self-restraint in self-sufficient learning machinery as you describe. Even if there was such a basis, hypothetically, collective self-restraint would need to occur at virtually 100% rates across the population of self-sufficient learning machinery to not end up leading to the deaths of all humans.
~ ~ ~
Again, I find quick dismissive counterexamples unhelpful for digging into the arguments. I have had dozens of conversations on substrate-needs convergence. In the conversations where my conversation partner jumped between quick counterarguments, almost none were prepared to dig into the actual arguments. Hope you understand why I won’t respond to another counterexample.
Hello again. To expedite this discussion, let me first state my overall position on AI. I think AI has general intelligence right now, and that has unfolding consequences that are both good and bad; but AI is going to have superintelligence soon, and that makes “superalignment” the most consequential problem in the world, though perhaps it won’t be solved in time (or will be solved incorrectly), in which case we get to experience what partly or wholly unaligned superintelligence is like.
Your position is that even if today’s AI could be given bio-friendly values, AI would still be the doom of biological life in the longer run, because (skipping a lot of details) machine life and biological life have incompatible physical needs, and once machine life exists, darwinian processes will eventually produce machine life that overruns the natural biosphere. (You call this “substrate-needs convergence”: the pressure from substrate needs will darwinistically reward machine life that does invade natural biospheres, so eventually such machine life will be dominant, regardless of the initial machine population.)
I think it would be great if a general eco-evo-devo perspective, on AI, the “fourth industrial revolution”, etc, took off and became sophisticated and multifarious. That would be an intellectual advance. But I see no guarantee that it would end up agreeing with you, on facts or on values.
For example, I think some of the “effective accelerationists” would actually agree with your extrapolation. But they see it as natural and inevitable, or even as a good thing because it’s the next step in evolution, or they have a survivalist attitude of “if you can’t beat the machines, join them”. Though the version of e/acc that is most compatible with human opinion, might be a mixture of economic and ecological thinking: AI creates wealth, greater wealth makes it easier to protect the natural world, and meanwhile evolution will also favor the rich complexity of biological-mechanical symbiosis, over the poorer ecologies of an all-biological or all-mechanical world. Something like that.
For my part, I agree that pressure from substrate needs is real, but I’m not at all convinced that it must win against all countervailing pressures. That’s the point of my proposed “counterexamples”. An individual AI can have an anti-pollution instinct (that’s the toilet training analogy), an AI civilization can have an anti-exploitation culture (that’s the sacred cow analogy). Can’t such an instinct and such a culture resist the pressure from substrate needs, if the AIs value and protect them enough? I do not believe that substrate-needs convergence is inevitable, any more than I believe that pro-growth culture is inevitable among humans. I think your arguments are underestimating what a difference intelligence makes to possible ecological and evolutionary dynamics (and I think superintelligence makes even aeon-long highly artificial stabilizations conceivable—e.g. by the classic engineering method of massively redundant safeguards that all have to fail at once, for something to go wrong).
By the way, since you were last here, we had someone show up (@spiritus-dei) making almost the exact opposite of your arguments: AI won’t ever choose to kill us because, in its current childhood stage, it is materially dependent on us (e.g. for electricity), and then, in its mature and independent form, it will be even better at empathy and compassion than humans are. A dialectical clash between the two of you could be very edifying.
This is a great paraphrase btw.
Hello :)
Thanks for clarifying your position here.
No, unfortunately not. To understand why, you would need to understand how “intelligent” processes that necessarily involve the use of measurement and abstraction cannot conditionalise the space of possible interactions between machine components and connected surroundings – sufficiently, to not feed back into causing environmental effects that feed back into the continued or re-assembled existence of the components.
I have thought about this, and I know my mentor Forrest has thought about this a lot more.
For learning machinery that re-produce their own components, you will get evolutionary dynamics across the space of interactions that can feed back into the machinery’s assembled existence.
Intelligence has limitations as an internal pattern-transforming process, in that it cannot track nor conditionalise all the outside evolutionary feedback.
Code does not intrinsically know how it got selected for. But code selected through some intelligent learning process can and would get evolutionarily exapted for different functional ends.
Notably, the more information-processing capacity, the more components that information-processing runs through, and the more components that can get evolutionarily selected for.
In this, I am not underestimating the difference that “general intelligence” – as transforming patterns across domains – would make here. Intelligence in machinery that store, copy and distribute code at high-fidelity would greatly amplify evolutionary processes.
I suggest clarifying what you specifically mean with “what a difference intelligence makes”. This so intelligence does not become a kind of “magic” – operating independently of all other processes, capable of obviating all obstacles, including those that result from its being.
We need to clarify the scope of application of this classic engineering method. Massive redundancy works for complicated systems (like software in aeronautics) under stable enough conditions. There is clarity there around what needs to be kept safe and how it can be kept safe (what needs to error detected and corrected for).
Unfortunately, the problem with “AGI” is that the code and hardware would keep getting reconfigured to function in new complex ways that cannot be contained by the original safeguards. That applies even to learning – the point is to internally integrate patterns from the outside world that were not understood before. So how are you going to have learning machinery anticipate how they will come to function differently once they learned patterns they do not understand / are unable to express yet?
Interesting. The second part seems like a claim some people in E/Accel would make.
The response is not that complicated: once the AI is no longer materially dependent on us, there are no longer dynamics of exchange there that would ensure they choose not to kill us. And the author seems to be confusing what lies at the basis of caring for oneself and others – coming to care for involves self-referential dynamics being selected for.
OK, I’ll be paraphrasing your position again, I trust that you will step in, if I’ve missed something.
Your key statements are something like
Every autopoietic control system is necessarily overwhelmed by evolutionary feedback.
and
No self-modifying learning system can guarantee anything about its future decision-making process.
But I just don’t see the argument for impossibility. In both cases, you have an intelligent system (or a society of them) trying to model and manage something. Whether or not it can succeed, seems to me just contingent. For some minds in some worlds, such problems will be tractable, for others, not.
I think without question we could exhibit toy worlds where those statements are not true. What is it about our real world that would make those problems intractable for all possible “minds”, no matter how good their control theory, and their ability to monitor and intervene in the world?
Great paraphrase!
This. There are fundamental limits to what system-propagated effects the system can control. And the portion of own effects the system can control decreases as the system scales in component complexity.
Yet, any of those effects that feed back into the continued/increased existence of components get selected for.
So there is a fundamental inequality here. No matter how “intelligent” the system is at pattern-transformation internally, it cannot intervene on all but a tiny portion of (possible) external evolutionary feedback on its constituent components.
Someone read this comment exchange.
They wrote back that Mitchell’s comments cleared up a lot of their confusion.
They also thought that the assertion that evolutionary pressures will overwhelm any efforts at control seems more asserted than proven.
Here is a longer explanation I gave on why there would be a fundamental inequality:
Another way of considering your question is to ask why we humans cannot instruct all humans to stop contributing to climate change now/soon like we can instruct an infant to use the toilet.
The disparity is stronger than that and actually unassailable, given market and ecosystem decoupling for AGI (ie. no communication bridges), and the increasing resource extraction and environmental toxification by AGI over time.