One thing I’d like this post to address is the speed at which this process happens.
You could also say that human extinction is inevitable because of the second law of thermodynamics, but it would be remiss not to mention the timescale involved.
I do find this post to be the most clear and persuasive articulation of your position so far. But I still strongly have the intuition that this concern is mostly not worth worrying about. You make a good case that a very large system given a very very long time would eventually converge on AIs that are optimized solely for their own propagation.
But I expect that in practice the external selection pressures would be sufficiently weak and the superintelligent AIs would be sufficiently adept at minimizing errors that this effect might not even show up in a measurable way in our solar system before the sun explodes.
On the other hand, in a world where humans never created more powerful technology than we have today, my intuition is that within a few thousand generations human society would end up dominated by bizarre cultures that explicitly optimize for maximum reproduction above all other values. And humans today explicitly not wanting that would not be sufficient to prevent that outcome. So the superintelligent AI being very good at modelling outcomes is doing some heavy lifting in my model.
It took 3.4 billion years for humans evolve, and for their society to develop, to the point that they could destroy humans living everywhere on Earth. That puts an initial upper bound on the time that evolution takes (that is slightly less than the time remaining before the sun explodes in our solar system).
In the case of fully autonomous AI, as continuing to persist in some form, the time taken for evolutionary selection to result in the extinction of all humans would be muchshorter.
Some differences in the rates of evolution I started explaining in the post:
FAAI would already have access to the functionality that humans took billions of years to evolve. This functionality can be repurposed by evolution.
FAAI can spread virtualised code much faster than humans can spread memes (over milliseconds rather than hours). The physical configurations of hardware parts can be reproduced faster too (within weeks, rather than decades).
The linked-up hardware of FAAI would learn, actuate, and reproduce at higher speeds (v.s. the wetware of human bodies). Therefore, the impacts of the evolving FAAI on our world scale faster too.
Humans modified their environment to contribute to their survival/reproduction. However, the environment that fits our needs is relatively close to what we and other organic lifeforms already evolved to create over billions of years. Therefore, we end up changing the environment in relatively tiny steps. However, since FAAI has an entirely different substrate, the current world is very far from what’s optimal for its survival and reproduction (save for secluded places such as ore-extracting mines, silicon-melting refineries, chip-etching cleanrooms, and supercooled server racks). Therefore it would evolve to modify the world in much larger steps.
Each of these factors compound with each other over time.
You can model it abstractly as a chain of events: initial capacities support the maintenance and increase of the code components, which results in further increase of capacities, that increase maintenance and maintain the increase, and so on. The factors of ‘capacity’, ‘maintenance’, and ‘increase’ end up combining in various ways, leading to outsized but mostly unpredictable consequences.
Actual rate calculations are above my paygrade. And they deserve a separate longer post.
Maybe nudge Anders Sandberg about it if you bump into him :) Anders had the same question, and Forrest Landry wanted to go through his reasoning with him at the Limits to Control Workshop. But they got distracted by other things.
Ah, I see you expanded your comment to add your opinions.
I do find this post to be the most clear and persuasive articulation of your position so far.
I’m glad. Thanks for mentioning!
But I still strongly have the intuition that this concern is mostly not worth worrying about.
When it comes to the risk of all humans dying, is it good enough to rely on your intuitions?
You suggested that I expand this post to address the timescales.
Do you feel now that you know enough to not have to question the assumptions you hold?
But I expect that in practice the external selection pressures would be sufficiently weak and the superintelligent AIs would be sufficiently adept at minimizing errors that this effect might not even show up in a measurable way in our solar system before the sun explodes.
What are you considering in terms of ‘error’?
A bitflip induced by a cosmic ray is an error, and it is easy to correct out by comparing the flipped bit to reference code.
When it comes to architectures running over many levels of abstraction (many nested sets of code, not just single bits) in interaction with the physical world, how do you define ‘error’?
What is an error in a neural network’s weights that can be subject to adversarial external selection? Even within this static ‘code’ (fixed after training rounds), can you actually detect, model, simulate, and evaluate comprehensively how possible selected code variants may dysfunction under trigger events in ways that harm humans?
What about for fully autonomous AI? This hardware assembly would be continuously learning new variants of code. That code can be received/computed explicitly, or be selected for implicitly in whatever hardware that ‘works’, or result from both processes running together (e.g. through code computed to run experiments on different variations of hardware to detect whatever works in the larger chaotic physical world).
The FAAI code is no longer static – but learned recursively through internal computed transformations of code being externally selected for, and vice versa, as well as implicitly selected configurations of hardware influencing the functioning of explicitly computed code. At least with adversarial attacks by humans, you can keep the humans out. But in this case, the adversariality emerges from all the changing code on the inside interactively converging on unsafe effects.
How can a system configured out of that changing code be relied upon to track and correct all its downstream recursive effects in the world?
On the other hand, in a world where humans never created more powerful technology than we have today, my intuition is that within a few thousand generations human society would end up dominated by bizarre cultures that explicitly optimize for maximum reproduction above all other values.
Yes, this matches my intuition (specifically where we do not manage to coherently connect across cultures on our deeper biological needs – to survive as life on Earth – which is a tall ask).
Glad we at least agree here, else it would be a tough conversation.
So the superintelligent AI being very good at modelling outcomes is doing some heavy lifting in my model.
Yes, it is true that FAAI would be very good at modelling outcomes in the world.
Also because it’s an FAAI, it continues learning to operate in ways that are adapted to its changing environment, and to be reproducing of old parts that wore down – in order to be fully autonomous.
Therefore FAAI cannot just be modelling the world better. It will also be causing effects in the world at a much greater scale and complexity.
So the question is not about FAAI’s absolute modelling capacity. There we both agree that FAAI could causally model much more of the world than we humans can.
The relevant question is about the FAAI’s overall capacity to model (and perform other steps needed for control) relative to its constituent components’ potential to recursively propagate effects over the world we humans live in.
Can any algorithm computed through components actually model (and simulate, compare, correct, and again detect) outcomes sufficiently? That is, sufficiently to contain the distributed feedback effects of all the components?
One thing I’d like this post to address is the speed at which this process happens.
You could also say that human extinction is inevitable because of the second law of thermodynamics, but it would be remiss not to mention the timescale involved.
I do find this post to be the most clear and persuasive articulation of your position so far. But I still strongly have the intuition that this concern is mostly not worth worrying about. You make a good case that a very large system given a very very long time would eventually converge on AIs that are optimized solely for their own propagation.
But I expect that in practice the external selection pressures would be sufficiently weak and the superintelligent AIs would be sufficiently adept at minimizing errors that this effect might not even show up in a measurable way in our solar system before the sun explodes.
On the other hand, in a world where humans never created more powerful technology than we have today, my intuition is that within a few thousand generations human society would end up dominated by bizarre cultures that explicitly optimize for maximum reproduction above all other values. And humans today explicitly not wanting that would not be sufficient to prevent that outcome. So the superintelligent AI being very good at modelling outcomes is doing some heavy lifting in my model.
I agree this is useful to know.
It took 3.4 billion years for humans evolve, and for their society to develop, to the point that they could destroy humans living everywhere on Earth. That puts an initial upper bound on the time that evolution takes (that is slightly less than the time remaining before the sun explodes in our solar system).
In the case of fully autonomous AI, as continuing to persist in some form, the time taken for evolutionary selection to result in the extinction of all humans would be much shorter.
Some differences in the rates of evolution I started explaining in the post:
FAAI would already have access to the functionality that humans took billions of years to evolve. This functionality can be repurposed by evolution.
FAAI can spread virtualised code much faster than humans can spread memes (over milliseconds rather than hours). The physical configurations of hardware parts can be reproduced faster too (within weeks, rather than decades).
The linked-up hardware of FAAI would learn, actuate, and reproduce at higher speeds (v.s. the wetware of human bodies). Therefore, the impacts of the evolving FAAI on our world scale faster too.
Humans modified their environment to contribute to their survival/reproduction. However, the environment that fits our needs is relatively close to what we and other organic lifeforms already evolved to create over billions of years. Therefore, we end up changing the environment in relatively tiny steps. However, since FAAI has an entirely different substrate, the current world is very far from what’s optimal for its survival and reproduction (save for secluded places such as ore-extracting mines, silicon-melting refineries, chip-etching cleanrooms, and supercooled server racks). Therefore it would evolve to modify the world in much larger steps.
Each of these factors compound with each other over time.
You can model it abstractly as a chain of events: initial capacities support the maintenance and increase of the code components, which results in further increase of capacities, that increase maintenance and maintain the increase, and so on. The factors of ‘capacity’, ‘maintenance’, and ‘increase’ end up combining in various ways, leading to outsized but mostly unpredictable consequences.
Actual rate calculations are above my paygrade. And they deserve a separate longer post.
Maybe nudge Anders Sandberg about it if you bump into him :) Anders had the same question, and Forrest Landry wanted to go through his reasoning with him at the Limits to Control Workshop. But they got distracted by other things.
Ah, I see you expanded your comment to add your opinions.
I’m glad. Thanks for mentioning!
When it comes to the risk of all humans dying, is it good enough to rely on your intuitions?
You suggested that I expand this post to address the timescales.
Do you feel now that you know enough to not have to question the assumptions you hold?
What are you considering in terms of ‘error’?
A bitflip induced by a cosmic ray is an error, and it is easy to correct out by comparing the flipped bit to reference code.
When it comes to architectures running over many levels of abstraction (many nested sets of code, not just single bits) in interaction with the physical world, how do you define ‘error’?
What is an error in a neural network’s weights that can be subject to adversarial external selection? Even within this static ‘code’ (fixed after training rounds), can you actually detect, model, simulate, and evaluate comprehensively how possible selected code variants may dysfunction under trigger events in ways that harm humans?
What about for fully autonomous AI? This hardware assembly would be continuously learning new variants of code. That code can be received/computed explicitly, or be selected for implicitly in whatever hardware that ‘works’, or result from both processes running together (e.g. through code computed to run experiments on different variations of hardware to detect whatever works in the larger chaotic physical world).
The FAAI code is no longer static – but learned recursively through internal computed transformations of code being externally selected for, and vice versa, as well as implicitly selected configurations of hardware influencing the functioning of explicitly computed code. At least with adversarial attacks by humans, you can keep the humans out. But in this case, the adversariality emerges from all the changing code on the inside interactively converging on unsafe effects.
How can a system configured out of that changing code be relied upon to track and correct all its downstream recursive effects in the world?
Yes, this matches my intuition (specifically where we do not manage to coherently connect across cultures on our deeper biological needs – to survive as life on Earth – which is a tall ask).
Glad we at least agree here, else it would be a tough conversation.
Yes, it is true that FAAI would be very good at modelling outcomes in the world.
Also because it’s an FAAI, it continues learning to operate in ways that are adapted to its changing environment, and to be reproducing of old parts that wore down – in order to be fully autonomous.
Therefore FAAI cannot just be modelling the world better. It will also be causing effects in the world at a much greater scale and complexity.
So the question is not about FAAI’s absolute modelling capacity. There we both agree that FAAI could causally model much more of the world than we humans can.
The relevant question is about the FAAI’s overall capacity to model (and perform other steps needed for control) relative to its constituent components’ potential to recursively propagate effects over the world we humans live in.
Can any algorithm computed through components actually model (and simulate, compare, correct, and again detect) outcomes sufficiently? That is, sufficiently to contain the distributed feedback effects of all the components?