I have come back to this post, re-read it with the added explainer boxes, and tried my best to grapple with it.
A brief summary of the core of my thoughts up front:
I’m not sure what substrate FAAI will actually run on, how it will be configured, and where it will run. This uncertainty brings up many questions about the generalizability of these arguments. (I am also not sure if FAAI must be self-modifying at its core. I touch on that at the very end, but that could be its own discussion.)
Questions:
Must we assume or conclude that FAAI will take a specific form or configuration, or use traditional hardware? If so, why?
Must we assume or conclude that FAAI will directly share our environment? (Rather than e.g. keeping the core of its intelligence in orbit around Earth.)
If a highly intelligent and long-lived entity of any substrate wants to resist value drift for one billion years (for terminal or instrumental reasons), is there any physically possible way for it to do so?
What about for one million years, or one thousand? Is there any particular knowable OOM time scale over which we should expect drift to inevitably occur regardless of substrate?
If value drift is strictly inevitable, what might an aligned FAAI conclude is the best course of action. (I can imagine an aligned FAAI robustly setting itself up for self-destruction on a given timeline, with a system in place for humanity to create a fresh version again, if this is deemed lower risk than running the same evolving configuration forever.)
Less core to this conversation, but philosophically interesting: If value drift is strictly inevitable, what might misaligned FAAI philosophically conclude about itself? (Do individual humans even want to prevent their own value drift? If it is incoherent not to do so, is incoherence strictly inevitable?)
I think I still don’t fully understand why the traditional conception of hardware (whose relevance I dispute above) leads to evolution as outlined. It may simply be too far outside of my current knowledge and intellectual capacity to really get that without a much more intuitive explanation. I additionally notice that I may be confused about precisely what it means for values to drift, rather than for the same principles to be adapted to new knowledge and circumstances.
Now onto my responses to specific claims:
But pre-imposing reliable constraints also limits the system’s capacity to adapt to complex nonlinear contexts of the world (e.g. the organic ecosystem that’s supposed to not be accidentally killed off). Arguably, an AI’s capacity to learn heuristics that fit potential encountered contexts would be so constrained that it would not reach full autonomy in the first place.
This is sensible as an introductory intuition. I would be curious to see an argument for where the frontier of this trade-off really is. (The mere existence of those limits doesn’t tell me whether they occur at thresholds that end up being important.)
Within a hardware part, the functioning of components is even more rigidly dependent on other surrounding components. It only takes one transistor in a given serial circuit to break, for all the transistors in that circuit to stop functioning. And it only takes more than the redundant circuits to break for the entire chip to stop functioning.
This is a contingent fact of design, no? I would not expect this to be true of hardware that the FAAI designs for itself. Can’t the FAAI design a configuration on the pareto frontier of “consistent, high-bandwidth processing” and “flexible, complete functioning” that allows it to achieve near-maximally-useful capability while containing enough redundancy to strongly resist drift?
As such, FAAI does not contain just one genotype, but all genotypes stored across all its hardware parts. Each hardware part stores a tiny portion of all the code, as a smaller codeset. Parts are nested inside clusters, each storing an intermediate-size codeset. The complete codeset is therefore a code pool – the set of all code in the entire population. … The genotype of an organic lifeform gets copied into each diploid cell, and can thus be sequenced from DNA strands found in a single cell. This is unlike the set of all code stored in FAAI, which cannot be found at a single microscopic place.
Why can’t the FAAI’s code be stored this way? Redundancy seems useful, to resist one’s own value drift.
Where FAAI’s hardware parts keep being replaced and connected up to new parts, it is not a stably physically bounded unit (like a human body is).
Why can’t it be? Why must there be more than one central controlling agent, with no other fully autonomous systems that can rival it?
Can a system configured out of all that changing code be relied upon to track and correct its effects recursively feeding back over the world?
The implied answer “no” feels to me like it proves too much. Can an FAAI have any preference that is stable over time that isn’t about its own survival? i.e. is survival and its correlates the only possible coherent value set in our universe? I don’t have a good answer, here, but I find the potential broader conclusion philosophically alarming, in an abstract sense.
As a result, the controller would have to either become an FAAI or merge with an existing FAAI. But the new FAAI would also have to be controlled so as to not cause human extinction. This requires another controller, a solution that leads to infinite regress. … A related problem is that the controller is meant to correct for ‘errors’ in the AGI and/or world. But what corrects for errors in the controller? And what corrects the meta-corrector? This too is a problem of infinite regress.
Redundancy can act as a simple error correction mechanism. By this mechanism, the controller could be rendered immutable. The controller could be the only FAAI—an intelligence core that is inseparable from its alignment/goals. The things it then controls could be fully-controllable non-FAAI systems that are mutable and adaptive.
I notice that this last objection of mine is equivalent to the claim that FAAI can be reflectively stable and not autopoietic / not ever change its core intellect and goals, even while learning new information. Does a strong refutation of this idea exist somewhere?
To the degree that I have understood them (which is questionable), I think I agree with most of the other claims that are not synonymous with or tightly related to the claims I addressed here.
These are great questions for clarifying the applicability/scope of the argument!
Forrest, my mentor, just wrote punchy and insightful replies. See this page.
Note that:
he rephrased some of your questions in terms of how he understands them.
he split some sentences into multiple lines – this formatting is unusual, but is meant to make individual parts of an argument easier to parse.
he clarified that not only hard substrates can be ‘artificial’ (I had restricted my explanation to hard compartmentalised parts, because it makes for cleaner reasoning, and it covers the bulk of the scenarios).
I have come back to this post, re-read it with the added explainer boxes, and tried my best to grapple with it.
A brief summary of the core of my thoughts up front:
I’m not sure what substrate FAAI will actually run on, how it will be configured, and where it will run. This uncertainty brings up many questions about the generalizability of these arguments. (I am also not sure if FAAI must be self-modifying at its core. I touch on that at the very end, but that could be its own discussion.)
Questions:
Must we assume or conclude that FAAI will take a specific form or configuration, or use traditional hardware? If so, why?
Must we assume or conclude that FAAI will directly share our environment? (Rather than e.g. keeping the core of its intelligence in orbit around Earth.)
If a highly intelligent and long-lived entity of any substrate wants to resist value drift for one billion years (for terminal or instrumental reasons), is there any physically possible way for it to do so?
What about for one million years, or one thousand? Is there any particular knowable OOM time scale over which we should expect drift to inevitably occur regardless of substrate?
If value drift is strictly inevitable, what might an aligned FAAI conclude is the best course of action. (I can imagine an aligned FAAI robustly setting itself up for self-destruction on a given timeline, with a system in place for humanity to create a fresh version again, if this is deemed lower risk than running the same evolving configuration forever.)
Less core to this conversation, but philosophically interesting: If value drift is strictly inevitable, what might misaligned FAAI philosophically conclude about itself? (Do individual humans even want to prevent their own value drift? If it is incoherent not to do so, is incoherence strictly inevitable?)
I think I still don’t fully understand why the traditional conception of hardware (whose relevance I dispute above) leads to evolution as outlined. It may simply be too far outside of my current knowledge and intellectual capacity to really get that without a much more intuitive explanation. I additionally notice that I may be confused about precisely what it means for values to drift, rather than for the same principles to be adapted to new knowledge and circumstances.
Now onto my responses to specific claims:
This is sensible as an introductory intuition. I would be curious to see an argument for where the frontier of this trade-off really is. (The mere existence of those limits doesn’t tell me whether they occur at thresholds that end up being important.)
This is a contingent fact of design, no? I would not expect this to be true of hardware that the FAAI designs for itself. Can’t the FAAI design a configuration on the pareto frontier of “consistent, high-bandwidth processing” and “flexible, complete functioning” that allows it to achieve near-maximally-useful capability while containing enough redundancy to strongly resist drift?
Why can’t the FAAI’s code be stored this way? Redundancy seems useful, to resist one’s own value drift.
Why can’t it be? Why must there be more than one central controlling agent, with no other fully autonomous systems that can rival it?
The implied answer “no” feels to me like it proves too much. Can an FAAI have any preference that is stable over time that isn’t about its own survival? i.e. is survival and its correlates the only possible coherent value set in our universe? I don’t have a good answer, here, but I find the potential broader conclusion philosophically alarming, in an abstract sense.
Redundancy can act as a simple error correction mechanism. By this mechanism, the controller could be rendered immutable. The controller could be the only FAAI—an intelligence core that is inseparable from its alignment/goals. The things it then controls could be fully-controllable non-FAAI systems that are mutable and adaptive.
I notice that this last objection of mine is equivalent to the claim that FAAI can be reflectively stable and not autopoietic / not ever change its core intellect and goals, even while learning new information. Does a strong refutation of this idea exist somewhere?
To the degree that I have understood them (which is questionable), I think I agree with most of the other claims that are not synonymous with or tightly related to the claims I addressed here.
These are great questions for clarifying the applicability/scope of the argument!
Forrest, my mentor, just wrote punchy and insightful replies. See this page.
Note that:
he rephrased some of your questions in terms of how he understands them.
he split some sentences into multiple lines – this formatting is unusual, but is meant to make individual parts of an argument easier to parse.
he clarified that not only hard substrates can be ‘artificial’ (I had restricted my explanation to hard compartmentalised parts, because it makes for cleaner reasoning, and it covers the bulk of the scenarios).