The Solomonoff prior is malign. It’s not a big deal.

Epistemic status: Endorsed at ~85% probability. In particular, there might be clever but hard-to-think-of encodings of observer-centered laws of physics that tilt the balance in favor of physics. Also, this isn’t that different from Mark Xu’s post.

Previously, previously, previously

I started writing this post with the intuition that the Solomonoff prior isn’t particularly malign, because of a sort of pigeon hole problem—for any choice of universal Turing machine there are too many complicated worlds to manipulate, and too few simple ones to do the manipulating.

Other people have different intuitions.

So there was only one thing to do.

Math.

All we have to do is compare [wild estimates of] the complexities of two different sorts of Turing machines: those that reproduce our observations by reasoning straightforwardly about the physical world, and those that reproduce our observations by simulating a totally different physical world that’s full of consequentialists who want to manipulate us.

Long story short, I was surprised. The Solomonoff prior is malign. But it’s not a big deal.

Team Physics:

If you live for 80 years and get 10^7 bits/​s of sensory signals, you accumulate about 10^16 bits of memory to explain via Solomonoff induction.

In comparison, there are about 10^51 electrons on Earth—just writing their state into a simulation is going to take somewhere in the neighborhood of 10^51 bits[1]. So the Earth, or any physical system within 35 orders of magnitude of complexity of the Earth, can’t be a Team Physics hypothesis for compressing your observations.

What’s simpler than the Earth? Turns out, simulating the whole universe. The universe can be mathematically elegant and highly symmetrical in ways that Earth isn’t.

For simplicity, let’s suppose that “I” am a computer with a simple architecture, plus some complicated memories. The trick that allows compression is you don’t need to specify the memories—you just need to give enough bits to pick out “me” among all computers with the same architecture in the universe[2]. This depends only on how many similar computers there are in Hilbert space with different memories[3], not directly on the complexity of my memories themselves.

So the complexity of a simple exemplar of Team Physics more or less looks like:

Complexity of our universe

+ Complexity of my architecture, and rules for reading observations out from that architecture

+ Length of the unique code for picking me out among things with the right architecture in the universe.

That last one is probably pretty big relative to the first two. The universe might only be a few hundred bits, and you can specify a simple computer with under a few thousand (or a human with ~10^8 bits of functional DNA). The number of humans in the quantum multiverse, past and future, is hard to estimate, but 10^8 bits is only a few person-years of listening to Geiger counter clicks, or being influenced by the single-photon sensitivity of the human eye.

Team Manipulation:

A short program on Team Manipulation just simulates a big universe with infinite accessible computing power and a natural coordinate system that’s obvious to its inhabitants. Life arises, and then some of that life realizes that they could influence other parts of the mathematical multiverse by controlling the simple locations in their universe. I’ll just shorthand the shortest program on Team Manipulation, and its inhabitants, as “Team Manipulation” full stop, since they dominate the results.

So they use the limitless computing power of their universe to simulate a bunch of universes, and when they find a universe where people are using the Solomonoff prior to make decisions, they note what string of bits the people are conditioning on, and whether they gain something (e.g. make this other universe more pleasing according to their standards) by influencing the decision. Then, if they want to influence the decision, they manipulate the state of their universe at the simple locations, so that it contains a unique “start reading” code, then the string the people are conditioning on, plus the desired continuation.

As the story goes, at some point they’ll discover our universe, and so if we ever use the Solomonoff prior to make decisions they’ll be in there trying to sway us. [cue spooky music]

The complexity of this program looks like:

Complexity of a universe with infinite computing power and Schelling points

+ Complexity of finding those Schelling points, and rules for reading data out

+ Length of the unique “start reading” code that encodes the observations you’re trying to continue with Solomonoff induction.

There’s a problem for applying this to our universe, or ones like it: computation costs! Team Manipulation can’t simulate anyone really using the Solomonoff prior, because they’re in it, and they can’t simulate themselves all the way to the end. Or looking at it from the other side, the people using the Solomonoff prior would need to have enough computing power to simulate their own universe many times over inside the Solomonoff prior computation.

Getting around this is possible, though.

The basic idea is to simulate universes that are computable except they have access to a Solomonoff prior oracle (or similar properties[4]). Then the manipulators just have to simulate that universe up until the first time someone is about to use the Solomonoff oracle, figure out what bit-string they’re going to condition on, and write down that bit string in their own universe plus their desired continuation.

In fact, Team Manipulation, despite living in a computable universe, can keep predicting you through multiple queries to a hypercomputer, with no complexity penalty, if they actually are the simplest Turing machine to output your observations and thus can control the output of the Solomonoff oracle.

In fact, not even un-hijacked hypercomputation is enough to stop Team Manipulation. Sure, they can’t predict the hypercomputer, but they can still simulate the aftermath of each possible hypercomputer output, adding one bit to the complexity of the “start reading” code for each bit of hypercomputer output.

Yes, those bits can add up real fast. But it’s not like any other Turing machines are doing better! To Turing machines, the hypercomputer output looks like noise, and can’t generally be compressed. So the rising tide of hypercomputation lifts all boats equally, not giving an advantage to Team Physics.

Comparison:

A key first point is that if we never find a hypercomputer lying around, Team Manipulation doesn’t really care about us. So for the rest of the post just take the perspective of someone who’s tripped over a Solomonoff oracle and is wondering what will happen.

Just eyeballing it...

The complexity of our universe and that of the Team Manipulation universe are probably fairly similar. The complexity of the rules for reading out data might be a bit different, in favor of Team Manipulation, but not by that much.

Most of the difference, are in the unique codes required to pick out me from other agents like me.

So which is it? Is it easier to code for me among people in our universe, or is it easier to code for me among people doing Solomonoff induction in all universes?

The complexity of that latter one—picking me out from all universes, knowing that I’m doing Solomonoff induction, can be factored into the complexity of picking out our universe, plus the complexity of picking me out from everyone doing Solomonoff induction in here[5].

So if we only use Solomonoff induction once in this entire universe, the complexity just goes close to the complexity of the universe, which I’m quite confident is small relative to the complexity of picking me out in Team Physics, and therefore that one use of the Solomonoff oracle is going to be manipulated.

But if every single person like me in the entire universe has used the Solomonoff oracle (or there have been a similar number of non-prefix queries), then the complexity of picking me out is the same either way, and so Team Manipulation is worse-off, and your use of the oracle is probably safe. Unless...

If you are a particularly unique person, such as the first person in your universe to ever stumble over the Solomonoff oracle and use it to make a crucial decision, with large impact on the rest of the universe, then hey presto, the complexity of specifying you in Team Manipulation’s “start reading” code is low again, and the Solomonoff oracle turns out to be untrustworthy after all. This goes double if you’ve never thought of these arguments and are predictably going to trust the oracle.

Does this matter for AI safety?:

No.

We don’t have a Solomonoff oracle. So we aren’t simulating Team Manipulation, and therefore aren’t making decisions based on their communiques.

And this isn’t really the sort of problem that has “echoes” in approximations. First, because the approximations we use are things like AIXI-tl or gradient descent on randomly initialized neural nets of fixed size, which don’t care about things they can’t simulate. Second, because even if we really did try to approximate Team Manipulation’s effects using abstract reasoning, it’s (even theoretically) hard to figure out what effects those would be without running their Turing machine (and all its similarly-short but similarly-expensive relatives) to find out.

Yes, inner optimizers in big neural nets might be a thing, and this is sort of like an inner optimizer problem in the Solomonoff prior. But even though both issues involve agents being inside of things, the underlying reasons are different, and learning about one isn’t super informative about the other.

Agenty optimizers show up in both cases because being an agenty optimizer is a great way to get things done. But for Team Manipulation in the Solomonoff prior, their agentiness is merely what we expect to happen if you simulate an ecosystem long enough to get intelligent life—it’s not really the active ingredient of the argument; that honor probably goes to having the extreme computing power to simulate entire other universes. Meanwhile, for inner optimizers in neural nets, their agentiness, and the usefulness of agentiness as a problem-solving tool, are the active ingredients.

So.

TL;DR:

I was wrong about the malignity of the Solomonoff prior. If you have a Solomonoff oracle and are making an important decisions with it, it’s untrustworthy. However, since we don’t have Solomonoff oracles, I regard this as mostly a curiosity.

  1. ^

    This isn’t really the K-complexity of Earth, since as we shortly argue that complexity is sublinear in the size of the Earth so long as it can be located inside a simulation of the whole universe. It’s just how expensive it would be to describe the Earth if it weren’t part of a nice simple universe, and instead had to be treated as a raw ingredient.

    It might seem obvious to you that such a raw description of Earth takes much more than 1 bit per electron since you have to specify their positions. I claim that it’s actually plausibly less than 1 bit per electron since so many electrons on Earth are part of correlated crystals. But either way, the total number is still way way bigger than our 10^16 bits of memories.

  2. ^

    This process is sufficient to locate me—you don’t need extra bridging rules. Extra bridging rules are necessary if pretending to be Cartesian, though.

  3. ^

    “How many” is a simplification—really there’s a continuous measure with an entropy that we can convert into bits.

  4. ^

    Many forms of hypercomputation work. Or even just extreme patience—a universe with lots of computational power that’s willing to put everything on pause to run an approximation of Solomonoff induction big enough to run Team Manipulation’s Turing machine counts.

  5. ^

    Plus or minus a constant depending on what’s going on in the other universes.