Ah, I see you expanded your comment to add your opinions.
I do find this post to be the most clear and persuasive articulation of your position so far.
I’m glad. Thanks for mentioning!
But I still strongly have the intuition that this concern is mostly not worth worrying about.
When it comes to the risk of all humans dying, is it good enough to rely on your intuitions?
You suggested that I expand this post to address the timescales.
Do you feel now that you know enough to not have to question the assumptions you hold?
But I expect that in practice the external selection pressures would be sufficiently weak and the superintelligent AIs would be sufficiently adept at minimizing errors that this effect might not even show up in a measurable way in our solar system before the sun explodes.
What are you considering in terms of ‘error’?
A bitflip induced by a cosmic ray is an error, and it is easy to correct out by comparing the flipped bit to reference code.
When it comes to architectures running over many levels of abstraction (many nested sets of code, not just single bits) in interaction with the physical world, how do you define ‘error’?
What is an error in a neural network’s weights that can be subject to adversarial external selection? Even within this static ‘code’ (fixed after training rounds), can you actually detect, model, simulate, and evaluate comprehensively how possible selected code variants may dysfunction under trigger events in ways that harm humans?
What about for fully autonomous AI? This hardware assembly would be continuously learning new variants of code. That code can be received/computed explicitly, or be selected for implicitly in whatever hardware that ‘works’, or result from both processes running together (e.g. through code computed to run experiments on different variations of hardware to detect whatever works in the larger chaotic physical world).
The FAAI code is no longer static – but learned recursively through internal computed transformations of code being externally selected for, and vice versa, as well as implicitly selected configurations of hardware influencing the functioning of explicitly computed code. At least with adversarial attacks by humans, you can keep the humans out. But in this case, the adversariality emerges from all the changing code on the inside interactively converging on unsafe effects.
How can a system configured out of that changing code be relied upon to track and correct all its downstream recursive effects in the world?
On the other hand, in a world where humans never created more powerful technology than we have today, my intuition is that within a few thousand generations human society would end up dominated by bizarre cultures that explicitly optimize for maximum reproduction above all other values.
Yes, this matches my intuition (specifically where we do not manage to coherently connect across cultures on our deeper biological needs – to survive as life on Earth – which is a tall ask).
Glad we at least agree here, else it would be a tough conversation.
So the superintelligent AI being very good at modelling outcomes is doing some heavy lifting in my model.
Yes, it is true that FAAI would be very good at modelling outcomes in the world.
Also because it’s an FAAI, it continues learning to operate in ways that are adapted to its changing environment, and to be reproducing of old parts that wore down – in order to be fully autonomous.
Therefore FAAI cannot just be modelling the world better. It will also be causing effects in the world at a much greater scale and complexity.
So the question is not about FAAI’s absolute modelling capacity. There we both agree that FAAI could causally model much more of the world than we humans can.
The relevant question is about the FAAI’s overall capacity to model (and perform other steps needed for control) relative to its constituent components’ potential to recursively propagate effects over the world we humans live in.
Can any algorithm computed through components actually model (and simulate, compare, correct, and again detect) outcomes sufficiently? That is, sufficiently to contain the distributed feedback effects of all the components?
Ah, I see you expanded your comment to add your opinions.
I’m glad. Thanks for mentioning!
When it comes to the risk of all humans dying, is it good enough to rely on your intuitions?
You suggested that I expand this post to address the timescales.
Do you feel now that you know enough to not have to question the assumptions you hold?
What are you considering in terms of ‘error’?
A bitflip induced by a cosmic ray is an error, and it is easy to correct out by comparing the flipped bit to reference code.
When it comes to architectures running over many levels of abstraction (many nested sets of code, not just single bits) in interaction with the physical world, how do you define ‘error’?
What is an error in a neural network’s weights that can be subject to adversarial external selection? Even within this static ‘code’ (fixed after training rounds), can you actually detect, model, simulate, and evaluate comprehensively how possible selected code variants may dysfunction under trigger events in ways that harm humans?
What about for fully autonomous AI? This hardware assembly would be continuously learning new variants of code. That code can be received/computed explicitly, or be selected for implicitly in whatever hardware that ‘works’, or result from both processes running together (e.g. through code computed to run experiments on different variations of hardware to detect whatever works in the larger chaotic physical world).
The FAAI code is no longer static – but learned recursively through internal computed transformations of code being externally selected for, and vice versa, as well as implicitly selected configurations of hardware influencing the functioning of explicitly computed code. At least with adversarial attacks by humans, you can keep the humans out. But in this case, the adversariality emerges from all the changing code on the inside interactively converging on unsafe effects.
How can a system configured out of that changing code be relied upon to track and correct all its downstream recursive effects in the world?
Yes, this matches my intuition (specifically where we do not manage to coherently connect across cultures on our deeper biological needs – to survive as life on Earth – which is a tall ask).
Glad we at least agree here, else it would be a tough conversation.
Yes, it is true that FAAI would be very good at modelling outcomes in the world.
Also because it’s an FAAI, it continues learning to operate in ways that are adapted to its changing environment, and to be reproducing of old parts that wore down – in order to be fully autonomous.
Therefore FAAI cannot just be modelling the world better. It will also be causing effects in the world at a much greater scale and complexity.
So the question is not about FAAI’s absolute modelling capacity. There we both agree that FAAI could causally model much more of the world than we humans can.
The relevant question is about the FAAI’s overall capacity to model (and perform other steps needed for control) relative to its constituent components’ potential to recursively propagate effects over the world we humans live in.
Can any algorithm computed through components actually model (and simulate, compare, correct, and again detect) outcomes sufficiently? That is, sufficiently to contain the distributed feedback effects of all the components?