Stuart_Armstrong comments on Would AIXI protect itself?

Stuart_Armstrong 10 Dec 2011 15:33 UTC
0 points
0
I’m assuming a situation where we’re not able to make a completely credible alteration. Maybe the AIXI’s memories about the number of grues goes: 1 grue, 2 grues, 3 grues, 3 grues, 5 grues, 6 grues… and it knows of no mechanisms to produce two grues at once (in its “most likely” models) and other evidence in its memory is consistent with their being 4 grues, not three. So it can figure out there are particular odd moments where the universe seems to behave in odd ways, unlike most moments. And then it may figure out that these odd moments are correlated with human action.
- hairyfigment 1 Dec 2013 19:42 UTC
  0 points
  0
  Parent
  ETA: misunderstood the parent. So it might think our actions made a grue, and would enjoy being told horrible lies which it could disprove. Except I don’t know how this interacts with Eliezer’s point.
- paulfchristiano 12 Dec 2011 19:26 UTC
  0 points
  0
  Parent
  Why are these odd moments correlated with human action? I modify the memory at time 100, changing a memory of what happened at time 10. AIXI observes something happen at time 10, and then a memory modification at time 100. Perhaps AIXI can learn a mapping between memory locations and instants in time, but it can’t model a change which reaches backwards in time (unless it learns a model in which the entire history of the universe is determined in advance, and just revealed sequentially, in which case it has learned a good enough self-model to stop caring about its own decisions).
  - Stuart_Armstrong 13 Dec 2011 11:13 UTC
    0 points
    0
    Parent
    I was suggesting that that if the time difference wasn’t too large, the AIXI could deduce “humans plan at time 10 to press button” → “weirdness at time 10 and button pressed at time 100″. If it’s good a modelling us, it may be able to deduce our plans long before we do, and as long as the plan predates the weirdness, it can model the plan as causal.
    
    Or if it experiences more varied situations, it might deduce “no interactions with humans for long periods” → “no weirdness”, and act in consequence.