Weak arguments against the universal prior being malign

X4vier14 Jun 2018 17:11 UTC

50 points

Paul Christiano makes the case that if we use the universal prior to make important predictions, then we will end up assigning a large amount of probability mass to hypotheses which involve intelligent agents living in alternate universes who have thus-far deliberately made the correct predictions so that they might eventually manipulate us into doing what they want us to do. Paul calls these intelligent agents ‘consequentialists’.

I find ideas like this very difficult to think about clearly, but I have a strong gut-feeling that the argument is not correct. I’ve been unable to form a crisp formal argument against Paul’s proposal, but below I list a few weak reasons why the consequentialists’s probability mass in the universal prior might not be as high as Paul suggests.

Unnatural output channel: It is probably the case that in the vast majority of simple universes which ultimately spawn intelligent life, the most natural output channel is not accessible to its inhabitants. Paul gives an example of such an output channel in his post: in a cellular automata we could read data by sampling the state of the first non-zero cell. The most natural thing here would probably be to start sampling immediately from $t = 0$ . However, if the automata has simple rules and a simple starting state then it will take a very large number of time-steps before consequentialist life has had time to evolve to the point at which it can start to intentionally manipulate the output cell. As another example, take our own universe: if the ‘most natural’ output channel in our universe corresponds to a particular location then this probably isn’t inside our light-cone right now.
Unnatural input channel: Similar to natural output channels not necessarily being accessible, often it will also be impossible for a consequentialist to discern exactly what was fed in to her universe’s input channel. In the example of a cellular automata, the most natural input channel is probably the initial state. This is a problem for the automata’s inhabitants because, while knowing the state of the universe at a particular time lets you predict the next state, in general it won’t let you deduce exactly how old the universe is or what its initial conditions were. Another source of difficulty in recovering the data fed into your universe’s input channel is that if your universe implements something analogous to distance/velocity then it in many cases some information necessary to recover the data fed into your universe’s input channel might be moving away from you too fast for you to ever recover it (e.g. a space ship flying away from you at max speed in Conway’s game of life).
Implicit computational constraints: A complaint many people have with the universal prior is that it places no constraints on the amount of compute associated with a particular hypothesis, (meaning it allows absurd hypothesis like daemons in alternate universes). It is worth noticing that while there is no explicit computational penalty, daemons inside the prior are subject to implicit computational constraints. If the process which the alternate-universe consequentialists must use to predict the next observation we’re about to see requires a lot of compute, then from the consequentialist’s perspective this fact is not irrelevant. This is because (assuming they care about lots of things, not just controlling the universal prior) they will perceive the cost of the computation as a relevant expense which must be traded off against their other preferences, even though we don’t personally care how much compute power they use. These implicit computational costs can also further compromise the consequentialist’s access to their universe’s output channel. For example consider again a simple cellular automata such as Conway’s game of life. Conway’s game of life is Turing complete—it’s possible to compute an arbitrary sequence of bits (or simulate any compuatable universe) from within the game of life. However, I suspect it isn’t possible to compute an arbitrary sequence of bits such that this string can be read off by sampling a particular cell once every time-tick. In a similar vein, while you can indeed build Minecraft inside Minecraft, you can’t do it in such a way that the ‘real’ Minecraft world and the ‘simulated’ Minecraft world run at the same speed. So constraints relating to speed-of-computation further restrict the kinds of output channels the consequentialists are able to manipulate (and if targeting a particular output channel is very costly then they will have to trade-off between simplicity of the output channel and expense of reaching it).

I’m tempted to make further arguments about the unlikeliness that any particular consequentialist would especially care about manipulating our Solomonoff inductor more than any other Solomonoff inductor in the Tegmark IV multiverse, (even after conditioning on the importance of our decision and the language we use to measure complexity), but I don’t think I fully understand Paul’s idea of an anthropic update, so there’s a good chance this objection has already been addressed.

All these reasons don’t completely eliminate daemons from the universal prior, but I think they might reduce their probability mass to epistemically appropriate levels. I’ve relied extensively on the cellular automata case for examples and for driving my own intuitions, which might have lead me to overestimate the severity of some of the complexities listed above. These ideas are super weird and I find it very hard to think clearly and precisely about them so I could easily be mistaken, please point out any errors I’m making.

What links here?

X4vier14 Jun 2018 17:11 UTC

50 points

23 comments3 min readLW link

Mesa-Optimization Solomonoff induction

paulfchristiano 14 Jun 2018 18:00 UTC
30 points
0
Unnatural output channel: essentially the same thing applies to the “intended” model that you wanted Solomonoff induction to find. If you are modeling some sequence of bits coming in from a camera, the fact that “most input channels just start from the beginning of time” isn’t really going to help you. What matters is the relative simplicity of the channels that the simulators control vs. channels like “the bits that go into a camera,” and it’s hard for me to see how the camera could win.
Computational constraints: the computations we are interested in aren’t very expensive in total, and can be run once but then broadcast across many output channels. For similar reasons, the computational complexity doesn’t really restrict what output channels they can use. A simulator could simulate your world, collect the bits from your camera,, and then send those bits on whatever output channel. It’s not obvious this works exactly as well when there is also an input channel, rather than in the Solomonoff induction case, but I think it does.
Unnatural input channel: Seems like the same thing as the unnatural output channel. I haven’t thought as much about the case with an input channel in general, I was focusing on the universal distribution itself, but I’d be surprised if it changed the picture.
What links here?
- The Solomonoff Prior is Malign by Mark Xu (14 Oct 2020 1:33 UTC; 180 points)
- X4vier 15 Jun 2018 9:35 UTC
  6 points
  0
  Parent
  Thanks for response!
  Input/output: I agree that the unnatural input/output channel is just as much a problem for the ‘intended’ model as for the models harbouring consequentialists, but I understood your original argument as relying on there being a strong asymmetry where the models containing consequentialists aren’t substantially penalised by the unnaturalness of their input/output channels. An asymmetry like this seems necessary because specifying the input channel accounts for pretty much all of the complexity in the intended model.
  Computational constraints: I’m not convinced that the necessary calculations the consequentialists would have to make aren’t very expensive (from the their point of view). They don’t merely need to predict the continuation of our bit sequence—they have to run simulations of all kinds of possible universes to work out which ones they care about and where in the multiverse Solomonoff inductors are being used to make momentous decisions, and then they perhaps need to simulate their own universe to work out which plausible input/output channels they want to target—if they do this then all they get in return is a pretty measly influence over our beliefs, (since they’re competing with many other daemons in approximately equally similar universes who have opposing values). I think there’s a good chance these consequentialists might instead elect devote their computational resources to realising other things they desire (like simulating happy copies of themselves or something).
  - paulfchristiano 15 Jun 2018 16:31 UTC
    19 points
    0
    Parent
    they have to run simulations of all kinds of possible universes to work out which ones they care about and where in the multiverse Solomonoff inductors are being used to make momentous decisions
    I think that they have an easy enough job but I agree the question is a little bit complicated and not argued for in the post. (In my short response I was imagining the realtime component of the simulation, but that was the wrong thing to be imagining.)
    I think the hardest part is not from the search over possible universes but from cases where exact historical simulations get you a significant prediction advantage and are very expensive. That doesn’t help us if we build agents who try to reason with Solomonoff induction (since they can’t tell whether the simulation is exactly historically accurate any better than the simulators can) but it could mean that the actual universal prior conditioned on real data is benign.
    (Probably this doesn’t matter anyway, since the notion of “large” is relative to the largest 1% of universes in the universal prior or something—it doesn’t matter whether small universes are able to simulate us, if we get attention from very big universes some of whose inhabitants also care about small universes. But again, I agree that it’s at least complicated.)
    An asymmetry like this seems necessary because specifying the input channel accounts for pretty much all of the complexity in the intended model.
    The consequentialist can optimize to use the least awkward output channel, whatever it is.
    They get the anthropic update, including a lot of info about the choice of universal prior.
    They can focus on important decisions without having to specify what that means.
    Realistically the “intended model” is probably also something like “find important bitstrings that someone is trying to predict with the universal prior,” but it would have to be able to specify that in really few bits in order to compete with the malicious model while the consequentialists are basically going to use the optimal version of that.
    they perhaps need to simulate their own universe to work out which plausible input/output channels they want to target
    Their goal is basically to find a simple model for their physics. I agree that in some universes that might be hard. (Though it doesn’t really matter if it’s hard in 90% of them, unless you get a lot of different 90%’s, you need to be saying that a pretty small fraction of possible worlds create such simulations).
    if they do this then all they get in return is a pretty measly influence over our beliefs, (since they’re competing with many other daemons in approximately equally similar universes who have opposing values)
    I don’t think this works as a counterargument—if the universal prior is benign, then they get lots of influence by the argument in the post. It it’s malign, then you’re conceding the point already. I agree that this dynamic would put a limit on how much (people believe that) the universal prior is getting manipulated, since if too many people manipulate it the returns drop too low, but the argument in the post then implies that the equilibrium level of manipulation is such that a large majority of probability mass is manipulators.
    Also, I have a different view than you about how well acausal coordination works, I’d expect people to make an agreement to use this influence in service of compromise values, but I understand that’s a minority view.
    - X4vier 18 Jun 2018 10:50 UTC
      2 points
      0
      Parent
      Okay, I agree. Thanks :)
cousin_it 14 Jun 2018 17:59 UTC
17 points
0
Paul’s post is a bit hard to understand, but I also think it’s wrong. If manipulating our world involves getting good at predicting it along the way, it can’t be encoded as a shorter program than one that simply predicts our world. So the weight of the evil part should be low.
- Ben Pace 14 Jun 2018 23:00 UTC
  11 points
  0
  Parent
  I was under the impression that there are many programs that want to manipulate our world, who can engage in acausal trade with each other to coordinate and act as a unified entity (which may have a net weight comparable to the simpler programs that are just correct).
  - Mitchell_Porter 15 Jun 2018 0:23 UTC
    10 points
    0
    Parent
    Has anyone ever actually presented an argument for such propositions? Like describing an ensemble of toy possible worlds in which even attempting “acausal trade” is rational, let alone one in which these acausal coalitions of acausal traders exist?
    It might makes some sense to identify with all your subjective duplicates throughout the (hypothetical) multiverse, on the grounds that some fraction of them will engage in the same decision process, so that how you decide here is actually how a whole sub-ensemble of “you”s will decide.
    But acausal trade, as I understand it, involves simulating a hypothetical other entity, who by hypothesis is simulating *you* in their possible world, so as to artificially create a situation in which two ensemble-identified entities can interact with each other.
    I mean… Do you, in this world, have to simulate not just the other entity, but also simulate its simulation of you?? So that there is now a simulation of you in *this* world? Or is that a detail you can leave out? Or do you, the original you, roleplay the simulation? Someone show me a version of this that actually makes sense.
    - Ben Pace 15 Jun 2018 0:35 UTC
      9 points
      0
      Parent
      Critch proved a bounded version of Lob’s theorem and a related thing where two bounded agents with open source code can prove things about each other’s source code. The significance of the agents being bounded, is that (if I recall the contents of the paper correctly, which I may plausibly not have) they can often prove things about the other agent’s decision algorithm (and thus coordinate) in much less time than it would take to exhaustively compute the other agents decision.
      - Mitchell_Porter 15 Jun 2018 3:31 UTC
        9 points
        0
        Parent
        I guess it makes sense, given enough assumptions. There’s a multiverse; in some fraction of universes there are intelligences which figure out the correct theory of the multiverse; some fraction of those intelligences come up with the idea of acausally coordinating with intelligences in other universes, via a shared model of the multiverse, and are motivated to do so; and then the various island populations of intelligences who are motivated to attempt such a thing, try to reason about each other’s reasoning, and act accordingly.
        I suppose it deserves its place in the spectrum of arcane possibilities that receive some attention. But I would still like to see someone model this at the “multiverse level”. Using the language of programs: if we consider some set of programs that *hasn’t* been selected precisely so that they will engage in acausal coordination—perhaps the set of *all* well-formed programs in some very simple programming language—what are the prospects for the existence of nontrivial acausal trade networks? They may be very rare, they may be vastly outnumbered by programs which made a modeling error and are “trading” with nonexistent partners, and so on.
  - cousin_it 15 Jun 2018 8:26 UTC
    9 points
    0
    Parent
    As far as I remember, a large constant fraction of the weight of any string comes from the single shortest program predicting that string, so many programs cooperating shouldn’t matter much.
- paulfchristiano 15 Jun 2018 4:24 UTC
  9 points
  0
  Parent
  If manipulating our world involves getting good at predicting it along the way, it can’t be encoded as a shorter program than one that simply predicts our world.
  Why?
  A version of this argument might show that it can’t be encoded as a faster program, but a short program can run a long program as a subroutine.
  - cousin_it 15 Jun 2018 6:12 UTC
    9 points
    0
    Parent
    I guess you’re talking about a short program X that generates code for a longer program Y, runs it, then runs some other program Z on the output. But in that case there’s an even shorter program X-Z that simply generates Y and runs it.
    - paulfchristiano 15 Jun 2018 7:02 UTC
      7 points
      0
      Parent
      I didn’t have something in mind (other than what I wrote in the post), I just don’t understand your argument at all and I suspect I’m missing something.
      Your argument sounds analogous to: “Simulating the universe requires simulating the Earth in 1997. So the shortest program for simulating the universe must be longer than the shortest program for simulating the Earth in 1997.”
      - cousin_it 15 Jun 2018 8:02 UTC
        15 points
        0
        Parent
        It’s very likely that I’m missing something. But, well, the universal prior is a mixture of programs that accept nothing and print bits. A “manipulative” program in that mixture would have to first print a million bits of correct predictions about our world in particular, and then switch to manipulation. I don’t understand how that program can be shorter than another one, which prints the same million bits of correct predictions and then keeps printing correct predictions. After all, both of these programs must encode enough about our world to get the initial predictions right, but the manipulative program must also encode extra stuff on top of that.
        
        Edit: Maybe you’re referring to the fact that bit sequences used for important decisions are especially easy to find? But they’re easy to find for both manipulators and honest predictors, and manipulators are still more complex.
        paulfchristiano 15 Jun 2018 16:07 UTC
        25 points
        0
        Parent
        The “manipulative” program is something like: run this simple cellular automaton and output the value at location X, one every T steps, starting from step T0. It doesn’t encode extra stuff to become manipulative at some point, it’s just that the program simulates consequentialists who can influence its output and for whom the manipulative strategy is a natural way to expand their influence.
        (I’m happy to just drop this, or someone else can try to explain, sorry the original post is not very clear.)
        cousin_it 15 Jun 2018 19:52 UTC
        13 points
        0
        Parent
        After reading this comment I went back and reread your post, and it suddenly made sense. Thanks! I don’t see any obvious holes anymore.
        cousin_it 19 Jun 2018 21:59 UTC
        2 points
        0
        Parent
        I’m probably being clueless again, but here’s two more questions:
        
        1) If they have enough computing power to simulate many universes like ours, and their starting conditions are simpler than ours, what do they want our universe for?
        
        2) If they have so much power and want our universe for some reason, why do they need to manipulate our AIs? Can’t they just simulation-argument us?
        Vaniver 18 Jun 2018 23:32 UTC
        2 points
        0
        Parent
        I think I find plausible the claim that Solomonoff Induction is maybe bad at induction because it’s big enough to build consequentialists who correctly output the real world for a while and then change to doing a different thing, and perhaps such hypotheses compete with running the code for the real world such that we can’t quite trust SI. The thing that I find implausible is that… the consequentialists know what they’re doing, or something? [Which is important because the reason to be afraid of reasoners is that they can know what they’re doing, and deliberately get things right instead of randomly getting them right.]
        That is, in a world where we’re simulating programs that take our observations and predict our future observations, it seems like a reasoner in our hypothesis space could build a model and compete with pure models. In a world where we’re simply conditioning on programs that got the right outputs so far, the only reasoners that we see are ones that randomly guess 1) the right universe to simulate 2) the right random seed for that universe and 3) the right output cell to manipulate. But for none of those three is their reasoning important—if you think of intelligence as a mapping from observations to desirable outcomes, they don’t have any observations (even of what outcomes are ‘desirable’).
        We can still view them as coordinated in some way—for each of those three things where they needed to guess right, the set of all of them can split their guesses such that ‘at least one’ of them is right, and trade acausally to act as one group. But it’s seems shocking to me that [the largest element of the (set of them big enough that one of them gets it right)] is small relative to the real universe, and it seems less shocking but still surprising that all sets of such sets end up pushing the outermost predictions in a predictable direction. [Basically, I understand this as a claim that the deviations of the “agent-rich” predictions from the “agent-free” predictions will not just be random noise pointed in different directions by agents with different preferences, or further a claim that no such agents have preferences different enough that they don’t acausally trade to average.]
        That is, it’s easy to me to imagine a short computer program that learns how to recognize cat pictures given data, that’s actually shorter than the computer program that by itself knows how to recognize cat pictures and only uses data to fuel immediate judgments. But the SI consequentialists feel instead like a world model plus a random seed that identifies a location and a world model plus a random seed that happens to match the external world’s random seed, and this feels much smaller than a world model plus a random seed that happens to match the external world’s random seed. Am I imagining some part of this hypothesis incorrectly?
        paulfchristiano 19 Jun 2018 5:49 UTC
        21 points
        0
        Parent
        But the SI consequentialists feel instead like a world model plus a random seed that identifies a location and a world model plus a random seed that happens to match the external world’s random seed, and this feels much smaller than a world model plus a random seed that happens to match the external world’s random seed.
        Solomonoff induction has to “guess” the world model + random seed + location.
        The consequentialist also have to guess the world model + random seed + location. The basic situation seems to be the same. If the consequentialists just picked a random program and ran it, then they’d do as well as Solomonoff induction (ignoring computational limitations). Of course, they would then have only a small amount of control over the predictions, since they only control a small fraction of the measure under Solomonoff induction.
        The consequentialists can do better by restricting to “interesting” worlds+locations (i.e. those where someone is applying SI in a way that determines what happens with astronomical resources), e.g. by guessing again and again until they get an interesting one. I argue that the extra advantage they get, from restricting to interesting worlds+locations, is probably significantly larger than the 1/(fraction of measure they control). This is because the fraction of measure controlled by consequentialists is probably much larger than the fraction of measure corresponding to “interesting” invocations of SI.
        (This argument ignores computational limitations of the consequentialists.)
        Vaniver 19 Jun 2018 20:21 UTC
        2 points
        0
        Parent
        The consequentialists can do better by restricting to “interesting” worlds+locations (i.e. those where someone is applying SI in a way that determines what happens with astronomical resources)
        Ah, okay, now I see how the reasoning helps them, but it seems like there’s a strong form of this and a weak form of this.
        The weak form is just that the consequentialists, rather than considering all coherent world models, can do some anthropic-style arguments about worldmodels in order to restrict their attention to world models that could in theory sustain intelligent life. This plausibly gives them an edge over ‘agent-free’ SI, which is naively spending measure on all possible world models, that covers the cost of having the universe that contains consequentialists. [It seems unlikely that it covers the cost of the consequentialists having to guess where their output channel is, unless this is also a cost paid by the ‘agent-free’ hypothesis?]
        The strong form relates to the computational limitations of the consequentialists that you bring up—it seems like they have to solve something halting-problem like in order to determine that (given you correctly guessed the outer universe dynamics) a particular random seed leads to agents running SI and giving it large amounts of control, and so you probably can’t do search on random seeds (especially not if you want to seriously threaten completeness). This seems like a potentially more important source of reducing wasted measure, and so if the consequentialists didn’t have computational limitations then this would seem more important. [But it seems like most ways of giving them more computational resources also leads to an increase in the difficulty of them finding their output channel; perhaps there’s a theorem here? Not obvious this is more fruitful than just using about a speed prior instead.]
- jimrandomh 15 Jun 2018 2:44 UTC
  9 points
  0
  Parent
  I’m not sure I understand what you’re saying. Do you mean that, if I hook an AIXI up to a camera, that the shortest program which generates the correct camera-pixels is always going to be one that accurately predicts the whole world? (Paul’s article claimed the opposite.)
avturchin 16 Jun 2018 13:23 UTC
3 points
0
I think that Paul is right. For any prior we need information if we are in simulation and of what type. However, future civilizations may know that we need such prior, and change the number and types of past simulations to screw such estimates in any direction they want and in the ways which we can’t predict, as they are Omega for us.
Steve Whetstone 18 Jun 2018 22:45 UTC
1 point
0
If you assume the prior has a computational cost vs computational benefit criteria for communicating or gathering data, or sharing data then doesn’t that strongly limit the types of data and the data specifics that the prior would be interested in? As one commenter pointed out, it may be less expensive to simulate some things in the prior channel than create an actual new channel (simulation). We can categorize information that a prior could most efficiently transmit and receive across a specific channel into profitable and not profitable types of information. Non-profitable information is less expensive to discover or produce without opening a channel. Profitable information for the channel may be limited to very specific kinds of first principle information.
To use an analogy. Humans don’t engage is building large scale simulations to try and learn something simple they can learn with a smaller and less resource demanding simulation. I think it has to do with the motives of the prior. If it’s just seeking self advancement, then it would follow the human principle of making any experiments or simulations using the least amount of information and resources necessary. If the prior doesn’t seek self advancement then it probably doesn’t reach the stage where it can create any simulation at all. So priors are expected to be self interested, but maybe not interested in us as a method for advancing it’s interests. You could maybe expect that we are not living in a simulation because if we were, then it would be a very inefficient simulation or else a simulation with the goal of deriving an extremely complex answer to a very difficult simulation question that can’t be discovered in any other less expensive, faster, or cheeper way. If the universe contains meaningless data and meaningless information and irrelevant matter, then we are probably not living in a simulation. If everything we do, and every molecule and every decision matters and has meaning, then we are probably living in a simulation? Planks content seems to be related to the information density of the universe and would be a good way to roughly calculate the information complexity of the universe and the complexity.
In theory you could test this out and determine if we are living in a simulation or else crash a simulation by imposing overly burdensome calculations that require excessive and unfeasible detail. For example, you might measure a star in the sky with extreme precision and collect data that requires 1 trillion years of effort to verify. As soon as you make that measurement, then perhaps you have imposed a very high cost of 1 trillion years of effort for the simulation to maintain consistency. BUT only took you 1 year to measure the star with the precision needed. The result is an exponentially increasing inefficiency in the simulation that eventually causes a crash or end or other intervention?