The first one is the claim that AIXI wouldn’t have a proper understanding of its body because its thoughts are defined mathematically. This is just wrong, IMO; my refutation, for a machine that’s similar enough to AIXI for this issue to work the same, is here.
I don’t see how your refutation applies to AIXI. Let me just try to explain in detail why I think AIXI will not properly protect its body. Consider an AIXI that arises in a simple universe, i.e., one computed by a short program P. AIXI has a probability distribution not over universes, but instead over environments where an environment is a TM whose output tape is AIXI’s input tape and whose input tape is AIXI’s output tape. What’s the simplest environment that fits AIXI’s past inputs/outputs? Presumably it’s E = P plus some additional pieces of code that injects E’s inputs into where AIXI’s physical output ports are located in the universe (that is, overrides the universe’s natural evolution using E’s inputs), and extracts E’s outputs from where AIXI’s physical input ports are located.
What happens when AIXI considers an action that destroys its physical body in the universe computed by P? As long as the input/output ports are not also destroyed, AIXI would expect that the environment E (with its “supernatural” injection/extraction code) will continue to receive its outputs and provide it with inputs.
Consider an AIXI that arises in a simple universe, i.e., one computed by a short program P.
An implementation of AIXI would be fairly complex. If P is too simple, then AIXI could not really have a body in the universe, so it would be correct in guessing that some irregularity in the laws of physics was causing its behaviors to be spliced into the behavior of the world.
However, if AIXI has observed enough of the inner workings of other similar machines, or enough of the laws of physics in general, or enough of its own inner workings, the simplest model will be that AIXI’s outputs really do emerge from the laws of physics in the real universe, since we are assuming that that is indeed the case and that Kolmogorov induction eventually works. At that point, imagining that AIXI’s behaviors are a consequence of a bunch of exceptions to the laws of physics is just extra complexity and won’t be part of the simplest hypothesis. It will be part of some less likely hypotheses, and the AI would have to take that risk into account when deciding whether to self-improve.
Tim, I think you’re probably not getting my point about the distinction between our concept of a computable universe, and AIXI’s formal concept of a computable environment. AIXI requires that the environment be a TM whose inputs match AIXI’s past outputs and whose outputs match AIXI’s past inputs. A candidate environment must have the additional code to inject/extract those inputs/outputs and place them on the input/output tapes, or AIXI will exclude it from its expected utility calculations.
The candidate environment must have the additional code to inject/extract those inputs/outputs and place them on the input/output tapes, or AIXI will exclude it from its expected utility calculations.
I agree that the candidate environment will need to have code to handle the inputs. However, if the candidate environment can compute the outputs on its own, without needing to be given the AI’s outputs, the candidate environment does not need code to inject the AI’s outputs into it.
Even if the AI can only partially predict its own behavior based on the behavior of the hardware it observes in the world, it can use that information to more efficiently encode its outputs in the candidate environment, so it can have some understanding of its position in the world even without being able to perfectly predict its own behavior from first principles.
If the AI manages to destroy itself, it will expect its outputs to be disconnected from the world and have no consequences, since anything else would violate its expectations about the laws of physics.
This back-and-forth appears to be useless. I should probably do some Python experiments and we then can change this from a debate to a programming problem, which would be much more pleasant.
However, if the candidate environment can compute the outputs on its own, without needing to be given the AI’s outputs, the candidate environment does not need code to inject the AI’s outputs into it.
If a candidate environment has no special code to inject AIXI’s outputs, then when AIXI computes expected utilities, it will find that all actions have equal utility in that environment, so that environment will play no role in its decisions.
I should probably do some Python experiments and we then can change this from a debate to a programming problem, which would be much more pleasant.
Ok, but try not to destroy the world while you’re at it. :) Also, please take a closer look at UDT first. Again, I think there’s a strong possibility that you’ll end up thinking “why did I waste my time defending CDT/AIXI?”
I don’t see how your refutation applies to AIXI. Let me just try to explain in detail why I think AIXI will not properly protect its body. Consider an AIXI that arises in a simple universe, i.e., one computed by a short program P. AIXI has a probability distribution not over universes, but instead over environments where an environment is a TM whose output tape is AIXI’s input tape and whose input tape is AIXI’s output tape. What’s the simplest environment that fits AIXI’s past inputs/outputs? Presumably it’s E = P plus some additional pieces of code that injects E’s inputs into where AIXI’s physical output ports are located in the universe (that is, overrides the universe’s natural evolution using E’s inputs), and extracts E’s outputs from where AIXI’s physical input ports are located.
What happens when AIXI considers an action that destroys its physical body in the universe computed by P? As long as the input/output ports are not also destroyed, AIXI would expect that the environment E (with its “supernatural” injection/extraction code) will continue to receive its outputs and provide it with inputs.
Does that make sense?
(Responding out of order)
Yes, but it makes some unreasonable assumptions.
An implementation of AIXI would be fairly complex. If P is too simple, then AIXI could not really have a body in the universe, so it would be correct in guessing that some irregularity in the laws of physics was causing its behaviors to be spliced into the behavior of the world.
However, if AIXI has observed enough of the inner workings of other similar machines, or enough of the laws of physics in general, or enough of its own inner workings, the simplest model will be that AIXI’s outputs really do emerge from the laws of physics in the real universe, since we are assuming that that is indeed the case and that Kolmogorov induction eventually works. At that point, imagining that AIXI’s behaviors are a consequence of a bunch of exceptions to the laws of physics is just extra complexity and won’t be part of the simplest hypothesis. It will be part of some less likely hypotheses, and the AI would have to take that risk into account when deciding whether to self-improve.
Tim, I think you’re probably not getting my point about the distinction between our concept of a computable universe, and AIXI’s formal concept of a computable environment. AIXI requires that the environment be a TM whose inputs match AIXI’s past outputs and whose outputs match AIXI’s past inputs. A candidate environment must have the additional code to inject/extract those inputs/outputs and place them on the input/output tapes, or AIXI will exclude it from its expected utility calculations.
I agree that the candidate environment will need to have code to handle the inputs. However, if the candidate environment can compute the outputs on its own, without needing to be given the AI’s outputs, the candidate environment does not need code to inject the AI’s outputs into it.
Even if the AI can only partially predict its own behavior based on the behavior of the hardware it observes in the world, it can use that information to more efficiently encode its outputs in the candidate environment, so it can have some understanding of its position in the world even without being able to perfectly predict its own behavior from first principles.
If the AI manages to destroy itself, it will expect its outputs to be disconnected from the world and have no consequences, since anything else would violate its expectations about the laws of physics.
This back-and-forth appears to be useless. I should probably do some Python experiments and we then can change this from a debate to a programming problem, which would be much more pleasant.
If a candidate environment has no special code to inject AIXI’s outputs, then when AIXI computes expected utilities, it will find that all actions have equal utility in that environment, so that environment will play no role in its decisions.
Ok, but try not to destroy the world while you’re at it. :) Also, please take a closer look at UDT first. Again, I think there’s a strong possibility that you’ll end up thinking “why did I waste my time defending CDT/AIXI?”