Logan Zoellner comments on Summary of the Acausal Attack Issue for AIXI

Logan Zoellner 13 Dec 2021 11:10 UTC
2 points
I feel like the word “attack” here is slightly confusing given that AIXI is fully deterministic. If you’re an agent with free will, then by definition you are not in a universe that is being used for Solomonoff Induction.
if you learn that there’s an input channel to your universe
There’s absolutely no requirement that someone in a simulation be able to see the input/output channels. The whole point of a simulation is that it should be indistinguishable from reality to those inside.
Consider the following pseudocode:
```
def predictSequence(seq):
    universe=initializeUniverse()
    obs=[]
    while True:
      obs.append(universe.stateAtPoint(0,0))
      universe.step()
       if equals( obs,seq):
        return universe.stateAtPoint(0,0)
```
Now suppose that every point in space is Topologically Indistinguishable (as in our universe). There is literally no way for an agent inside the universe to distinguish the “output channel” from any other point in the universe.
But wait, there can only be so many low-complexity universes, and if they’re launching successful attacks, said attacks would be distributed amongst a far far far larger population of more-complex universes.
This is precisely the point of Solomonoff Induction. Because there are so few low-complexity Turing machines, a machine with the property “accurately predicts my data” is much more likely than a machine with the property “accurately predicts my data and then do something malicious”.
Well, by virtue of running an AIXI-like agent that will have large influence on the future, that’s an especially interesting property of a universe which would tend to draw a whole lot more attention from agents interested in influencing other computations than just being some generic high-complexity computation.
The fact that you are running AIXI means you have access to a halting-oracle. This means it is literally impossible for an agent inside a Turing Machine to out-think you. This is also a kind of “disappointing” property of AIXI. It means that you can’t use it to predict things about your own universe (where halting oracles exist), only about simpler universes (which can be simulated on Turing machines). This is kind of like how in our universe there exists a system of logic (first-order logic) of which has a consistent and complete definition, but most of the questions we care about in math arise from second-order-logic, which is inherently incomplete.
For complex enough bridge rules relative to the complexity of your universe, hypotheses that produce powerful optimizers that target your universe (and an output channel), can come in substantially shorter than “here’s the description of the universe, here’s the bridge rule”
I don’t get why we are assuming the bridge rules will be complicated? Imagine we are simulating the universe using the Game of Life, why not just have a rule like “output the sequence of values at position 0,0”. I mean, I guess you could intentionally choose a bad bridge rule, but you could also intentionally use AIXI to output the most malicious thing possible. So I guess I figure before we learn how to build Halting Oracles we’ll also learn to not do that.
- Gurkenglas 13 Dec 2021 17:24 UTC
  2 points
  Parent
  If you’re an agent with free will, then by definition you are not in a universe that is being used for Solomonoff Induction.
  You choosing your actions is compatible with a deterministic universe. https://www.lesswrong.com/posts/NEeW7eSXThPz7o4Ne/thou-art-physics
  Now suppose that every point in space is Topologically Indistinguishable (as in our universe).
  Then initializeUniverse() or universe.step() must somehow break the symmetry of the initial state, perhaps through nondeterminism. Simple universes that put a lot of weight on one timeline will be asymmetric, right?
  much more likely than a machine with the property “accurately predicts my data and then do something malicious”
  The idea is that “accurately predicts my data” is implied by “do something malicious”, which you will find contains one fewer word :P.
  This means it is literally impossible for an agent inside a Turing Machine to out-think you.
  In Robust Cooperation in the Prisoner’s Dilemma, agents each prove that the other will cooperate. The halting problem may be undecidable in the general case, but haltingness can sure be proven/disproven in many particular cases.
  I don’t get why we are assuming the bridge rules will be complicated?
  I don’t expect our own bridge rules to be simple: Maxwell’s equations look simple enough, but locating our Earth in the quantum multiverse requires more bits of randomness than there are atoms.