# Clarifying Consequentialists in the Solomonoff Prior

I have spent a long time be­ing con­fused about Paul’s post on con­se­quen­tial­ists in the Solomonoff prior. I now think I un­der­stand the prob­lem clearly enough to en­gage with it prop­erly.

I think the rea­son I was con­fused is to a large de­gree a prob­lem of fram­ing. It seemed to me in the course of dis­cus­sions I had to de­con­fuse my­self to me that similar con­fu­sions are shared by other peo­ple. In this post, I will at­tempt to ex­plain the fram­ing that helped clar­ify the prob­lem for me.

## i. A brief sketch of the Solomonoff prior

The Solomonoff, or Univer­sal, prior is a prob­a­bil­ity dis­tri­bu­tion over strings of a cer­tain alpha­bet (usu­ally over all strings of 1s and 0s). It is defined by tak­ing the set of all Tur­ing ma­chines (TMs) which out­put strings, as­sign­ing to each a weight pro­por­tional to

(where L is its de­scrip­tion length), and then as­sign­ing to each string a prob­a­bil­ity equal to the weights of the TMs that com­pute it. The de­scrip­tion length is closely re­lated to the amount of in­for­ma­tion re­quired to spec­ify the ma­chine; I will use de­scrip­tion length and amount of in­for­ma­tion for speci­fi­ca­tion in­ter­change­ably.

(The ac­tual for­mal­ism is in fact a bit more tech­ni­cally in­volved. I think this pic­ture is de­tailed enough, in the sense that my ex­pla­na­tion will map onto the real for­mal­ism about as well.)

The above defines the Solomonoff prior. To perform Solomonoff in­duc­tion, one can also define con­di­tional dis­tri­bu­tions by con­sid­er­ing only those TMs that gen­er­ate strings be­gin­ning with a cer­tain pre­fix. In this post, we’re not in­ter­ested in that pro­cess, but only in the prior.

## ii. The Mal­ign Prior Argument

In the post, Paul claims that the prior is dom­i­nated by con­se­quen­tial­ists. I don’t think it is quite dom­i­nated by them, but I think the effect in ques­tion is plau­si­bly real.

I’ll call the key claim in­volved the Mal­ign Prior Ar­gu­ment. On my preferred fram­ing, it goes some­thing like this:

Pre­miss: For some strings, it is eas­ier to spec­ify a Tur­ing Ma­chine that simu­lates a rea­soner which de­cides to pre­dict that string, than it is to spec­ify the in­tended gen­er­a­tor for that string.

Con­clu­sion: There­fore, those strings’ Solomonoff prior prob­a­bil­ity will be dom­i­nated by the weight as­signed to the TM con­tain­ing the rea­soner.

It’s best to ex­plain the idea of an ‘in­tended gen­er­a­tor’ with ex­am­ples. In the case of a cam­era sig­nal as the string, the in­tended gen­er­a­tor is some­thing like a TM that simu­lates the uni­verse, plus a speci­fi­ca­tion of the point in the simu­la­tion where the cam­era in­put should be sam­pled. Ap­prox­i­ma­tions to this, like a low-fidelity simu­la­tion, can also be con­sid­ered in­tended gen­er­a­tors.

There isn’t any­thing spe­cial about the in­tended gen­er­a­tor’s re­la­tion­ship to the string—it’s just one way in which that string can be gen­er­ated. It seems most nat­u­ral to us as hu­mans, and the Oc­camian na­ture of SI feels like it should be bi­ased to­wards such strings, but noth­ing in prin­ci­ple stops some­thing less ‘nat­u­ral’ from be­ing in fact a shorter de­scrip­tion.

This idea of ‘nat­u­ral­ness’ is im­por­tant in un­der­stand­ing what the Mal­ign Prior Ar­gu­ment is about; I will use it roughly to re­fer to some­thing like ‘the set of Tur­ing Machines that don’t in­volve rea­son­ers that at­tempt to in­fluence the prior’, or ‘the set of in­tended gen­er­a­tors’. It’s vague, but I think it gets across the point.

I read most of Paul’s post as an ex­is­tence ar­gu­ment for the pre­miss, us­ing con­se­quen­tial­ists in other wor­lds as the rea­son­ers. I don’t think all such rea­son­ers are like Paul de­scribes; I also doubt that all or even most strings are sub­ject to this effect, but find it very plau­si­ble that some are.

I think the ar­gu­ment is not, at its core, about these rea­son­ers mak­ing the strings they out­put more likely than the ‘true string’. It is con­cern­ing enough that there is any effect at all that these rea­son­ers have on the prior, which is the fact this ar­gu­ment es­tab­lishes.

As a side note, it’s also worth not­ing that this is not about these rea­son­ers break­ing out of the box and tak­ing over our world, al­though that is also a re­lated con­cern one might have.

## iii. The sup­port for pre­miss 1

Con­sider a string S’ with very high nat­u­ral K-com­plex­ity (de­scrip­tion length of the in­tended gen­er­a­tor) that shares a pre­fix with a string S that is of high in­ter­est to hu­man-like civil­i­sa­tions.

I claim that the prior prob­a­bil­ity of this string is higher than it ‘nat­u­rally’ ‘should’ be, in the sense that a large part of the weight that com­poses this prob­a­bil­ity is com­ing from a TM that simu­lates a rea­soner that is at­tempt­ing to in­fluence the prior.

The rea­sons this hap­pens are:

1. A rea­soner in a TM can have an ar­bi­trar­ily long amount of com­pute time to de­cide what strings to out­put.

2. Spec­i­fy­ing rea­son­ers is cheap rel­a­tive to spec­i­fy­ing the string S’.

3. There ex­ists a rea­soner whose goals are best served by in­fluenc­ing the prior to make S’ more likely.

1 is a cru­cial prop­erty of the Solomonoff prior that al­lows this to hap­pen. A TM in the Solomonoff prior can think for a very, very long time — enough to e.g. simu­late an Ack­er­man(Ack­er­man(10)) ini­tial world states each for Ack­er­man(Ack­er­man(10)) timesteps. It can perform some­thing close to an ex­haus­tive search of all pos­si­ble civ­i­liza­tions and de­cide to at­tempt to in­fluence the one that is most sus­cep­ti­ble to be in­fluenced, if that’s what it wants to do. This is a ridicu­lous com­pu­ta­tion, but we’re talk­ing about a math­e­mat­i­cal ob­ject, not an ac­tual pro­cess that we run. It’s plau­si­ble that if the prior was also weighted by speed of com­pu­ta­tion, these effects would be far less pro­nounced (and maybe would not arise at all).

To see that 2 and 3 are plau­si­ble, we need to think about S’, which by as­sump­tion is a string with high nat­u­ral K-com­plex­ity. This high com­plex­ity ‘buys’ us the space to spec­ify a rea­soner, and the space to spec­ify val­ues, with­out mak­ing the TM more com­plex than a nat­u­ral gen­er­a­tor of S’. Now, be­cause S is by as­sump­tion of in­ter­est to civil­i­sa­tions, there likely ex­ists a TM con­tain­ing a rea­soner that performs its ex­haus­tive search, finds S, and con­cludes that its val­ues are best served by mak­ing S’ more likely (e.g. to in­fluence the de­ci­sion-mak­ing of civil­i­sa­tions that are think­ing about what S is, given a pre­fix of it known to them).

In a way, this agent uses its sim­plic­ity to give more sim­plic­ity to some other string. That is how the prior gets hi­jacked.

Note that this rea­soner will need to have goals that are sim­pler than the nat­u­ral gen­er­a­tor of S’ in or­der to ac­tu­ally con­tribute to S’ be­ing more likely—oth­er­wise, spec­i­fy­ing its TM would be more ex­pen­sive than spec­i­fy­ing the nat­u­ral gen­er­a­tor of S’.

The above is non-con­struc­tive (in the math­e­mat­i­cal sense), but nev­er­the­less the ex­is­tence of strings S’ that are af­fected thus seems plau­si­ble. The spaces of pos­si­ble TMs and of the strings we (or other users of the Solomonoff prior) could be in­ter­ested in are sim­ply too vast for there not to be such TMs. Whether there are very many of these, or whether they are so much more com­pli­cated than the string S so as to make this effect ir­rele­vant to our in­ter­ests, are differ­ent ques­tions.

## iv. Alien consequentialists

In my view, Paul’s ap­proach in his post is a more con­struc­tive strat­egy for es­tab­lish­ing 2 and 3 in the ar­gu­ment above. If cor­rect, it sug­gests a stronger re­sult—not only does it cause the prob­a­bil­ity of S’ to be dom­i­nated by the TM con­tain­ing the rea­soner, it makes the prob­a­bil­ity of S’ roughly com­pa­rable to S, for a wide class of choices of S.

In par­tic­u­lar, the choice of S that is sus­cep­ti­ble to this is some­thing like the cam­era ex­am­ple I used, where the nat­u­ral gen­er­a­tor is S is a speci­fi­ca­tion of our world to­gether with a lo­ca­tion where we take sam­ples from. The alien civil­i­sa­tion is a way to con­struct a Tur­ing Ma­chine that out­puts S’ which has com­pa­rable com­plex­ity to S.

To do that, we spec­ify a uni­verse, then run it for how­ever long we want, un­til we get some­where within it smart agents that de­cide to in­fluence the prior. Since 1 is true, these agents have an ar­bi­trary amount of time to de­cide what they out­put. If S is im­por­tant, there prob­a­bly will be a civil­i­sa­tion some­where in some simu­lated world which will de­cide to at­tempt to in­fluence de­ci­sions based on S, and out­put an ap­pro­pri­ate S’. We then spec­ify the out­put chan­nel to be what­ever they de­cide to use as the out­put chan­nel.

This re­quires a rel­a­tively mod­est amount of in­for­ma­tion—enough to spec­ify the uni­verse, and the lo­ca­tion of the out­put. This is on the same or­der as the nat­u­ral gen­er­a­tor for S it­self, if it is like a cam­era sig­nal.

Try­ing to spec­ify our rea­soner within this space (rea­son­ers that nat­u­rally de­velop in simu­la­tions) does place re­stric­tions on what kind of rea­soner we up end up with. For in­stance, there are now some im­plicit run­time bounds on many of our rea­son­ers, be­cause they likely care about things other than the prior. Nev­er­the­less, the space of our rea­son­ers re­mains vast, in­clud­ing un­al­igned su­per­in­tel­li­gences and other odd minds.

## v. Con­clu­sion. Do these ar­gu­ments ac­tu­ally work?

I am mostly con­vinced that there is at least some weird­ness in the Solomonoff prior.

A part of me wants to add ‘es­pe­cially around strings whose pre­fixes are used to make pivotal de­ci­sions’; I’m not sure that is right, be­cause I think scarcely any­one would ac­tu­ally use this prior in its true form—ex­cept, per­haps, an AI rea­son­ing about it ab­stractly and naïvely enough not to be con­cerned about this effect de­spite hav­ing to ex­plic­itly con­sider it.

In fact, a lot of my doubt about the ma­lign Solomonoff prior is con­cen­trated around this con­cern: if the rea­son­ers don’t be­lieve that any­one will act based on the true prior, it seems un­clear why they should spend a lot of re­sources on mess­ing with it. I sup­pose the space is large enough for at least some to get con­fused into do­ing some­thing like this by mis­take.

I think that even if my doubts are cor­rect, there will still be weird­ness as­so­ci­ated with the agents that are speci­fied di­rectly, along the lines of sec­tion iii, if not those that ap­pear in simu­lated uni­verses, as de­scribed in iv.

No nominations.
No reviews.
• The part about the rea­son­ers hav­ing an ar­bi­trary amount of time to think wasn’t ob­vi­ous to me. The TM can run for ar­bi­trar­ily long but if it is simu­lat­ing a uni­verse and us­ing the uni­verse to de­ter­mine its out­put then the TM needs to spec­ify a sys­tem for read­ing from the uni­verse.

If that sys­tem in­volves a start-to-read time that is long enough for the in-uni­verse life to rea­son about the uni­ver­sal prior then that time speci­fi­ca­tion alone would take a huge num­ber of bits.

On the other hand, I could imag­ine a scheme that looks for a spe­cific short trig­ger se­quence at a par­tic­u­lar spa­tial lo­ca­tion then starts read­ing out. If this trig­ger se­quence is un­likely to oc­cur nat­u­rally then the civ­i­liza­tion would have as long as they want to rea­son about the prior. So over­all it does seem plau­si­ble to me now to al­low for ar­bi­trar­ily long in-uni­verse time.

• The trig­ger se­quence is a cool idea.

I want to add that the in­tended gen­er­a­tor TM also needs to spec­ify a start-to-read time, so there is sym­me­try there. What­ever method a TM needs to use to se­lect the cam­era start time in the in­tended gen­er­a­tor for the real world sam­ples, it can also use in the simu­lated world with alien life, since for the scheme to work only the differ­ence in com­plex­ity be­tween the two mat­ters.

There is ad­di­tional flex in that un­like the in­tended gen­er­a­tor, the rea­soner TM can sam­ple its uni­verse simu­la­tion at any cheaply com­putable in­ter­val, giv­ing the civil­i­sa­tion the op­tion of choos­ing any amount of think­ing they can perform be­tween out­puts, if they so choose.

• I’m not con­vinced that the prob­a­bil­ity of S’ could be pushed up to any­thing near the prob­a­bil­ity of S. Spec­i­fy­ing an agent that wants to trick you into pre­dict­ing S’ rather than S with high prob­a­bil­ity when you see their com­mon pre­fix re­quires spec­i­fy­ing the agency re­quired to plan this type of de­cep­tion (which should be quite com­pli­cated), and spec­i­fy­ing the com­mon pre­fix of S and S’ as the par­tic­u­lar tar­get for the de­cep­tion (which, in­so­far as it makes sense to say that S is the “cor­rect” con­tinu­a­tion of the pre­fix, should have about the same “nat­u­ral” com­plex­ity as S). That is, spec­i­fy­ing such an agent re­quires all the in­for­ma­tion re­quired to spec­ify S, plus a bunch of over­head to spec­ify agency, which adds up to much more com­plex­ity than S it­self.

• spec­i­fy­ing the agency re­quired to plan this type of de­cep­tion (which should be quite com­pli­cated)

Sup­pose that I just spec­ify a generic fea­ture of a simu­la­tion that can sup­port life + ex­pan­sion (the com­plex­ity of spec­i­fy­ing “a simu­la­tion that can sup­port life” is also paid by the in­tended hy­poth­e­sis, so we can fac­tor it out). Over a long enough time such a simu­la­tion will pro­duce life, that life will spread through­out the simu­la­tion, and even­tu­ally have some con­trol over many fea­tures of that simu­la­tion.

And spec­i­fy­ing the com­mon pre­fix of S and S’ as the par­tic­u­lar tar­get for the de­cep­tion (which, in­so­far as it makes sense to say that S is the “cor­rect” con­tinu­a­tion of the pre­fix, should have about the same “nat­u­ral” com­plex­ity as S)

Once you’ve speci­fied the agent, it just sam­ples ran­domly from the dis­tri­bu­tion of “strings I want to in­fluence.” That has a way lower prob­a­bil­ity than the “nat­u­ral” com­plex­ity of a string I want to in­fluence. For ex­am­ple, if 1/​quadrillion strings are im­por­tant to in­fluence, then the at­tack­ers are able to save log(quadrillion) bits.

• Sup­pose that I just spec­ify a generic fea­ture of a simu­la­tion that can sup­port life + ex­pan­sion (the com­plex­ity of spec­i­fy­ing “a simu­la­tion that can sup­port life” is also paid by the in­tended hy­poth­e­sis, so we can fac­tor it out). Over a long enough time such a simu­la­tion will pro­duce life, that life will spread through­out the simu­la­tion, and even­tu­ally have some con­trol over many fea­tures of that simu­la­tion.

Oh yes, I see. That does cut the com­plex­ity over­head down a lot.

Once you’ve speci­fied the agent, it just sam­ples ran­domly from the dis­tri­bu­tion of “strings I want to in­fluence.” That has a way lower prob­a­bil­ity than the “nat­u­ral” com­plex­ity of a string I want to in­fluence. For ex­am­ple, if 1/​quadrillion strings are im­por­tant to in­fluence, then the at­tack­ers are able to save log(quadrillion) bits.

I don’t un­der­stand what you’re say­ing here.

• I agree that this prob­a­bly hap­pens when you set out to mess with an ar­bi­trary par­tic­u­lar S, I.e. try to make some S’ that shares a pre­fix with S as likely as S.

How­ever, some S are spe­cial, in the sense that their pre­fixes are be­ing used to make very im­por­tant de­ci­sions. If you, as a mal­i­cious TM in the prior, perform an ex­haus­tive search of uni­verses, you can nar­row down your op­tions to only a few pre­fixes used to make pivotal de­ci­sions, se­lect­ing one of those to mess with is then very cheap to spec­ify. I use S to re­fer to those strings that are the ‘nat­u­ral’ con­tinu­a­tion of those cheap-to-spec­ify pre­fixes.

There are, it seems to me, a bunch of other equally-com­plex TMs that want to make other strings that share that pre­fix more likely, in­clud­ing some that pro­mote S it­self. What the re­sult­ing bal­ance looks like is un­clear to me, but what’s clear is that the prior is ma­lign with re­spect to that pre­fix—con­di­tion­ing on that pre­fix gives you a dis­tri­bu­tion al­most en­tirely con­trol­led by these ma­lign TMs. The ‘nat­u­ral’ com­plex­ity of S, or of other strings that share the pre­fix, play al­most no role in their pri­ors.

The above is of course con­di­tional on this ex­haus­tive search be­ing pos­si­ble, which also re­lies on there be­ing any­one in any uni­verse that ac­tu­ally uses the prior to make de­ci­sions. Other­wise, we can’t se­lect the pre­fixes that can be messed with.

• This rea­son­ing seems to rely on there be­ing such strings S that are use­ful to pre­dict far out of pro­por­tion to what you would ex­pect from their com­plex­ity. But a de­scrip­tion of the cir­cum­stance in which pre­dict­ing S is so use­ful should it­self give you a way of spec­i­fy­ing S, so I doubt that this is pos­si­ble.

• I agree. That’s what I meant when I wrote there will be TMs that ar­tifi­cially pro­mote S it­self. How­ever, this would still mean that most of S’s mass in the prior would be due to these TMs, and not due to the nat­u­ral gen­er­a­tor of the string.

Fur­ther­more, it’s un­clear how many TMs would pro­mote S vs S’ or other al­ter­na­tives. Be­cause of this, I don’t now whether the prior would be higher for S or S’ from this rea­son­ing alone. Whichever is the case, the prior no longer re­flects mean­ingful in­for­ma­tion about the uni­verse that gen­er­ates S and whose in­hab­itants are us­ing the pre­fix to choose what to do; it’s dom­i­nated by these TMs that search for pre­fixes they can at­tempt to in­fluence.

• I didn’t mean that an agenty Tur­ing ma­chine would find S and then de­cide that it wants you to cor­rectly pre­dict S. I meant that to the ex­tent that pre­dict­ing S is com­monly use­ful, there should be a sim­ple un­der­ly­ing rea­son why it is com­monly use­ful, and this rea­son should give you a nat­u­ral way of com­put­ing S that does not have the over­head of any agency that de­cides whether or not it wants you to cor­rectly pre­dict S.

• How many bits do you think it takes to spec­ify the prop­erty “peo­ple’s pre­dic­tions about S, us­ing uni­ver­sal prior P, are very im­por­tant”?

(I think you’ll need to spec­ify the uni­ver­sal prior P by refer­ence to the uni­ver­sal prior that is ac­tu­ally used in the world con­tain­ing the string S, if you spell out the prior P ex­plic­itly you are already sunk just from the am­bi­guity in the choice of lan­guage.)

It seems rel­a­tively un­likely to me that this will be cheaper than spec­i­fy­ing some ar­bi­trary de­gree of free­dom in a com­pu­ta­tion­ally rich uni­verse that life can con­trol (+ the ex­tra log(frac­tion of de­grees of free­dom the con­se­quen­tial­ists ac­tu­ally choose to con­trol)). Of course it might.

I agree that the en­tire game is in the con­stants—what is the cheap­est way to pick out im­por­tant strings.

• I don’t think that spec­i­fy­ing the prop­erty of im­por­tance is sim­ple and helps nar­row down S. I think that in or­der for pre­dict­ing S to be im­por­tant, S must be gen­er­ated by a sim­ple pro­cess. Pro­cesses that take large num­bers of bits to spec­ify are cor­re­spond­ingly rarely oc­cur­ring, and thus less use­ful to pre­dict.

• I don’t buy it. A cam­era that some robot is us­ing to make de­ci­sions is no sim­pler than any other place on Earth, just more im­por­tant.