Clarifying The Malignity of the Universal Prior: The Lexical Update

[UPDATE: looks like the lexical update is real after all; see Paul’s comment and my reply]

In Paul’s classic post What does the universal prior actually look like? he lays out an argument that the universal prior, if it were to be used for important decisions, would likely be malign, giving predictions that would effectively be under the control of alien consequentialists. He argues for this based on an ‘anthropic update’ the aliens could make that would be difficult to represent in a short program. We can split this update into two parts: an ‘importance update’ restricting attention to bits fed into priors used to make important decisions, and what I’m calling a ‘lexical update’ which depends on the particular variant of the universal prior being used. I still believe that the ‘importance update’ would be very powerful, but I’m not sure anymore about the ‘lexical update’. So in this post I’m going to summarize both in my own words then explain my skepticism towards the ‘lexical update’.

As background, note that ‘straightforwardly’ specifying data such as our experiences in the universal prior will take far more bits than just describing the laws of physics, as you’ll also need to describe our location in spacetime, an input method, and a set of Everett branches(!), all of which together will probably take more than 10000 bits(compared to the laws alone which likely only take a few hundred) Thus, any really short program(a few hundred bits, say) that could somehow predict our experiences well would likely have a greater probability according to the universal prior than the ‘straightforward’ explanation.

Paul’s post argues that there likely do exist such programs. I’m going to fix a reference prefix machine U which generates a universal prior. The argument goes:

A) there are many long-running programs with short descriptions according to U, such as our universe.

B) If other programs are like our universe’s program, aliens could evolve there and end up taking over their programs.

C) Since their program has high measure in U, the aliens will plausibly have been selected to be motivated to control short programs in U.

D) To control U, the aliens could try to manipulate beings using the universal prior who have control over short programs in U (like us, hypothetically)

E) If the aliens are reasonably motivated to manipulate U, we can sample them doing that with few bits.

F) The aliens will now try to output samples from Q, the distribution over people using the universal prior to make important decisions(decisions impacting short programs in U). They can do this much more efficiently than any ‘straightforward’ method. For instance, when specifying which planet we are on, the aliens can restrict attention to planets which eventually develop life, saving a great many bits.

G) The aliens can then choose a low-bit broadcast channel in their own universe, so the entire manipulative behavior has a very short description in U.

H) For a short program to compete with the aliens, it would essentially need access to Q. But this seems really hard to specify briefly.

So far I agree. But the post also argues that even a short program that could sample from Q would still lose badly to the aliens, based on what I’m calling a ‘lexical update’, as follows:

I) In practice most people in U using ‘the universal prior’ won’t use U itself but one of many variants U’(different universal programming languages)

J) Each of those variants U’ will have their own Q’, the distribution over people making important decisions with U’. Q is then defined as the average over all of those variants(with different U’ weighted by simplicity in U)

K) Since short programs in different U’ look different from each other, the aliens in those programs will be able to tell which variant U’ they are likely to be in.

L) The distributions Q’ of people in U using different variants U’ all look different. Describing each Q’ given Q should take about as many bits as it takes to specify U’ using U.

M) But the aliens will already know they are in U’, and so can skip that, gaining a large advantage even over Q.

But there’s a problem here. In C) it was argued that aliens in short programs in U will be motivated to take over other short programs in U. When we condition on the aliens actually living somewhere short according to U’, they are instead motivated to control short programs in U’. This would reduce their motivation to control short programs in U proportionally to the difficulty of describing U in U’, and with less motivation, it takes more bits to sample their manipulative behaviors in E). The advantage they gained in L) over Q was proportional to the difficulty of describing U’ in U. On average these effects should cancel out, and the aliens’ probability mass will be comparable to Q.

The universal prior is still likely malign, as it’s probably hard to briefly specify Q, but it no longer seems to me like the aliens would decisively beat Q. I still feel pretty confused about all this so comments pointing out any mistakes or misinterpretations would be appreciated.