# Clari­fy­ing Con­sequen­tial­ists in the So­lomonoff Prior

I have spent a long time be­ing con­fused about Paul’s post on con­sequen­tial­ists in the So­lomonoff prior. I now think I un­der­stand the prob­lem clearly enough to en­gage with it prop­erly.

I think the reason I was con­fused is to a large de­gree a prob­lem of fram­ing. It seemed to me in the course of dis­cus­sions I had to de­con­fuse my­self to me that sim­ilar con­fu­sions are shared by other people. In this post, I will at­tempt to ex­plain the fram­ing that helped cla­rify the prob­lem for me.

## i. A brief sketch of the So­lomonoff prior

The So­lomonoff, or Univer­sal, prior is a prob­ab­il­ity dis­tri­bu­tion over strings of a cer­tain al­pha­bet (usu­ally over all strings of 1s and 0s). It is defined by tak­ing the set of all Tur­ing ma­chines (TMs) which out­put strings, as­sign­ing to each a weight pro­por­tional to

(where L is its de­scrip­tion length), and then as­sign­ing to each string a prob­ab­il­ity equal to the weights of the TMs that com­pute it. The de­scrip­tion length is closely re­lated to the amount of in­form­a­tion re­quired to spe­cify the ma­chine; I will use de­scrip­tion length and amount of in­form­a­tion for spe­cific­a­tion in­ter­change­ably.

(The ac­tual form­al­ism is in fact a bit more tech­nic­ally in­volved. I think this pic­ture is de­tailed enough, in the sense that my ex­plan­a­tion will map onto the real form­al­ism about as well.)

The above defines the So­lomonoff prior. To per­form So­lomonoff in­duc­tion, one can also define con­di­tional dis­tri­bu­tions by con­sid­er­ing only those TMs that gen­er­ate strings be­gin­ning with a cer­tain pre­fix. In this post, we’re not in­ter­ested in that pro­cess, but only in the prior.

## ii. The Ma­lign Prior Argument

In the post, Paul claims that the prior is dom­in­ated by con­sequen­tial­ists. I don’t think it is quite dom­in­ated by them, but I think the ef­fect in ques­tion is plaus­ibly real.

I’ll call the key claim in­volved the Ma­lign Prior Ar­gu­ment. On my pre­ferred fram­ing, it goes some­thing like this:

Premiss: For some strings, it is easier to spe­cify a Tur­ing Machine that sim­u­lates a reasoner which de­cides to pre­dict that string, than it is to spe­cify the in­ten­ded gen­er­ator for that string.

Con­clu­sion: There­fore, those strings’ So­lomonoff prior prob­ab­il­ity will be dom­in­ated by the weight as­signed to the TM con­tain­ing the reasoner.

It’s best to ex­plain the idea of an ‘in­ten­ded gen­er­ator’ with ex­amples. In the case of a cam­era sig­nal as the string, the in­ten­ded gen­er­ator is some­thing like a TM that sim­u­lates the uni­verse, plus a spe­cific­a­tion of the point in the sim­u­la­tion where the cam­era in­put should be sampled. Ap­prox­im­a­tions to this, like a low-fi­del­ity sim­u­la­tion, can also be con­sidered in­ten­ded gen­er­at­ors.

There isn’t any­thing spe­cial about the in­ten­ded gen­er­ator’s re­la­tion­ship to the string—it’s just one way in which that string can be gen­er­ated. It seems most nat­ural to us as hu­mans, and the Oc­camian nature of SI feels like it should be biased to­wards such strings, but noth­ing in prin­ciple stops some­thing less ‘nat­ural’ from be­ing in fact a shorter de­scrip­tion.

This idea of ‘nat­ur­al­ness’ is im­port­ant in un­der­stand­ing what the Ma­lign Prior Ar­gu­ment is about; I will use it roughly to refer to some­thing like ‘the set of Tur­ing Machines that don’t in­volve reason­ers that at­tempt to in­flu­ence the prior’, or ‘the set of in­ten­ded gen­er­at­ors’. It’s vague, but I think it gets across the point.

I read most of Paul’s post as an ex­ist­ence ar­gu­ment for the premiss, us­ing con­sequen­tial­ists in other worlds as the reason­ers. I don’t think all such reason­ers are like Paul de­scribes; I also doubt that all or even most strings are sub­ject to this ef­fect, but find it very plaus­ible that some are.

I think the ar­gu­ment is not, at its core, about these reason­ers mak­ing the strings they out­put more likely than the ‘true string’. It is con­cern­ing enough that there is any ef­fect at all that these reason­ers have on the prior, which is the fact this ar­gu­ment es­tab­lishes.

As a side note, it’s also worth not­ing that this is not about these reason­ers break­ing out of the box and tak­ing over our world, al­though that is also a re­lated con­cern one might have.

## iii. The sup­port for premiss 1

Con­sider a string S’ with very high nat­ural K-com­plex­ity (de­scrip­tion length of the in­ten­ded gen­er­ator) that shares a pre­fix with a string S that is of high in­terest to hu­man-like civil­isa­tions.

I claim that the prior prob­ab­il­ity of this string is higher than it ‘nat­ur­ally’ ‘should’ be, in the sense that a large part of the weight that com­poses this prob­ab­il­ity is com­ing from a TM that sim­u­lates a reasoner that is at­tempt­ing to in­flu­ence the prior.

The reas­ons this hap­pens are:

1. A reasoner in a TM can have an ar­bit­rar­ily long amount of com­pute time to de­cide what strings to out­put.

2. Spe­cify­ing reason­ers is cheap re­l­at­ive to spe­cify­ing the string S’.

3. There ex­ists a reasoner whose goals are best served by in­flu­en­cing the prior to make S’ more likely.

1 is a cru­cial prop­erty of the So­lomonoff prior that al­lows this to hap­pen. A TM in the So­lomonoff prior can think for a very, very long time — enough to e.g. sim­u­late an Ack­er­man(Ack­er­man(10)) ini­tial world states each for Ack­er­man(Ack­er­man(10)) timesteps. It can per­form some­thing close to an ex­haust­ive search of all pos­sible civil­iz­a­tions and de­cide to at­tempt to in­flu­ence the one that is most sus­cept­ible to be in­flu­enced, if that’s what it wants to do. This is a ri­dicu­lous com­pu­ta­tion, but we’re talk­ing about a math­em­at­ical ob­ject, not an ac­tual pro­cess that we run. It’s plaus­ible that if the prior was also weighted by speed of com­pu­ta­tion, these ef­fects would be far less pro­nounced (and maybe would not arise at all).

To see that 2 and 3 are plaus­ible, we need to think about S’, which by as­sump­tion is a string with high nat­ural K-com­plex­ity. This high com­plex­ity ‘buys’ us the space to spe­cify a reasoner, and the space to spe­cify val­ues, without mak­ing the TM more com­plex than a nat­ural gen­er­ator of S’. Now, be­cause S is by as­sump­tion of in­terest to civil­isa­tions, there likely ex­ists a TM con­tain­ing a reasoner that per­forms its ex­haust­ive search, finds S, and con­cludes that its val­ues are best served by mak­ing S’ more likely (e.g. to in­flu­ence the de­cision-mak­ing of civil­isa­tions that are think­ing about what S is, given a pre­fix of it known to them).

In a way, this agent uses its sim­pli­city to give more sim­pli­city to some other string. That is how the prior gets hi­jacked.

Note that this reasoner will need to have goals that are sim­pler than the nat­ural gen­er­ator of S’ in or­der to ac­tu­ally con­trib­ute to S’ be­ing more likely—oth­er­wise, spe­cify­ing its TM would be more ex­pens­ive than spe­cify­ing the nat­ural gen­er­ator of S’.

The above is non-con­struct­ive (in the math­em­at­ical sense), but nev­er­the­less the ex­ist­ence of strings S’ that are af­fected thus seems plaus­ible. The spaces of pos­sible TMs and of the strings we (or other users of the So­lomonoff prior) could be in­ter­ested in are simply too vast for there not to be such TMs. Whether there are very many of these, or whether they are so much more com­plic­ated than the string S so as to make this ef­fect ir­rel­ev­ant to our in­terests, are dif­fer­ent ques­tions.

## iv. Alien consequentialists

In my view, Paul’s ap­proach in his post is a more con­struct­ive strategy for es­tab­lish­ing 2 and 3 in the ar­gu­ment above. If cor­rect, it sug­gests a stronger res­ult—not only does it cause the prob­ab­il­ity of S’ to be dom­in­ated by the TM con­tain­ing the reasoner, it makes the prob­ab­il­ity of S’ roughly com­par­able to S, for a wide class of choices of S.

In par­tic­u­lar, the choice of S that is sus­cept­ible to this is some­thing like the cam­era ex­ample I used, where the nat­ural gen­er­ator is S is a spe­cific­a­tion of our world to­gether with a loc­a­tion where we take samples from. The alien civil­isa­tion is a way to con­struct a Tur­ing Machine that out­puts S’ which has com­par­able com­plex­ity to S.

To do that, we spe­cify a uni­verse, then run it for how­ever long we want, un­til we get some­where within it smart agents that de­cide to in­flu­ence the prior. Since 1 is true, these agents have an ar­bit­rary amount of time to de­cide what they out­put. If S is im­port­ant, there prob­ably will be a civil­isa­tion some­where in some sim­u­lated world which will de­cide to at­tempt to in­flu­ence de­cisions based on S, and out­put an ap­pro­pri­ate S’. We then spe­cify the out­put chan­nel to be whatever they de­cide to use as the out­put chan­nel.

This re­quires a re­l­at­ively mod­est amount of in­form­a­tion—enough to spe­cify the uni­verse, and the loc­a­tion of the out­put. This is on the same or­der as the nat­ural gen­er­ator for S it­self, if it is like a cam­era sig­nal.

Try­ing to spe­cify our reasoner within this space (reason­ers that nat­ur­ally de­velop in sim­u­la­tions) does place re­stric­tions on what kind of reasoner we up end up with. For in­stance, there are now some im­pli­cit runtime bounds on many of our reason­ers, be­cause they likely care about things other than the prior. Never­the­less, the space of our reason­ers re­mains vast, in­clud­ing un­aligned su­per­in­tel­li­gences and other odd minds.

## v. Con­clu­sion. Do these ar­gu­ments ac­tu­ally work?

I am mostly con­vinced that there is at least some weird­ness in the So­lomonoff prior.

A part of me wants to add ‘es­pe­cially around strings whose pre­fixes are used to make pivotal de­cisions’; I’m not sure that is right, be­cause I think scarcely any­one would ac­tu­ally use this prior in its true form—ex­cept, per­haps, an AI reas­on­ing about it ab­stractly and naïvely enough not to be con­cerned about this ef­fect des­pite hav­ing to ex­pli­citly con­sider it.

In fact, a lot of my doubt about the ma­lign So­lomonoff prior is con­cen­trated around this con­cern: if the reason­ers don’t be­lieve that any­one will act based on the true prior, it seems un­clear why they should spend a lot of re­sources on mess­ing with it. I sup­pose the space is large enough for at least some to get con­fused into do­ing some­thing like this by mis­take.

I think that even if my doubts are cor­rect, there will still be weird­ness as­so­ci­ated with the agents that are spe­cified dir­ectly, along the lines of sec­tion iii, if not those that ap­pear in sim­u­lated uni­verses, as de­scribed in iv.

• I’m not con­vinced that the prob­ab­il­ity of S’ could be pushed up to any­thing near the prob­ab­il­ity of S. Spe­cify­ing an agent that wants to trick you into pre­dict­ing S’ rather than S with high prob­ab­il­ity when you see their com­mon pre­fix re­quires spe­cify­ing the agency re­quired to plan this type of de­cep­tion (which should be quite com­plic­ated), and spe­cify­ing the com­mon pre­fix of S and S’ as the par­tic­u­lar tar­get for the de­cep­tion (which, in­so­far as it makes sense to say that S is the “cor­rect” con­tinu­ation of the pre­fix, should have about the same “nat­ural” com­plex­ity as S). That is, spe­cify­ing such an agent re­quires all the in­form­a­tion re­quired to spe­cify S, plus a bunch of over­head to spe­cify agency, which adds up to much more com­plex­ity than S it­self.

• spe­cify­ing the agency re­quired to plan this type of de­cep­tion (which should be quite com­plic­ated)

Sup­pose that I just spe­cify a gen­eric fea­ture of a sim­u­la­tion that can sup­port life + ex­pan­sion (the com­plex­ity of spe­cify­ing “a sim­u­la­tion that can sup­port life” is also paid by the in­ten­ded hy­po­thesis, so we can factor it out). Over a long enough time such a sim­u­la­tion will pro­duce life, that life will spread through­out the sim­u­la­tion, and even­tu­ally have some con­trol over many fea­tures of that sim­u­la­tion.

And spe­cify­ing the com­mon pre­fix of S and S’ as the par­tic­u­lar tar­get for the de­cep­tion (which, in­so­far as it makes sense to say that S is the “cor­rect” con­tinu­ation of the pre­fix, should have about the same “nat­ural” com­plex­ity as S)

Once you’ve spe­cified the agent, it just samples ran­domly from the dis­tri­bu­tion of “strings I want to in­flu­ence.” That has a way lower prob­ab­il­ity than the “nat­ural” com­plex­ity of a string I want to in­flu­ence. For ex­ample, if 1/​quad­ril­lion strings are im­port­ant to in­flu­ence, then the at­tack­ers are able to save log(quad­ril­lion) bits.

• Sup­pose that I just spe­cify a gen­eric fea­ture of a sim­u­la­tion that can sup­port life + ex­pan­sion (the com­plex­ity of spe­cify­ing “a sim­u­la­tion that can sup­port life” is also paid by the in­ten­ded hy­po­thesis, so we can factor it out). Over a long enough time such a sim­u­la­tion will pro­duce life, that life will spread through­out the sim­u­la­tion, and even­tu­ally have some con­trol over many fea­tures of that sim­u­la­tion.

Oh yes, I see. That does cut the com­plex­ity over­head down a lot.

Once you’ve spe­cified the agent, it just samples ran­domly from the dis­tri­bu­tion of “strings I want to in­flu­ence.” That has a way lower prob­ab­il­ity than the “nat­ural” com­plex­ity of a string I want to in­flu­ence. For ex­ample, if 1/​quad­ril­lion strings are im­port­ant to in­flu­ence, then the at­tack­ers are able to save log(quad­ril­lion) bits.

I don’t un­der­stand what you’re say­ing here.

• I agree that this prob­ably hap­pens when you set out to mess with an ar­bit­rary par­tic­u­lar S, I.e. try to make some S’ that shares a pre­fix with S as likely as S.

However, some S are spe­cial, in the sense that their pre­fixes are be­ing used to make very im­port­ant de­cisions. If you, as a ma­li­cious TM in the prior, per­form an ex­haust­ive search of uni­verses, you can nar­row down your op­tions to only a few pre­fixes used to make pivotal de­cisions, se­lect­ing one of those to mess with is then very cheap to spe­cify. I use S to refer to those strings that are the ‘nat­ural’ con­tinu­ation of those cheap-to-spe­cify pre­fixes.

There are, it seems to me, a bunch of other equally-com­plex TMs that want to make other strings that share that pre­fix more likely, in­clud­ing some that pro­mote S it­self. What the res­ult­ing bal­ance looks like is un­clear to me, but what’s clear is that the prior is ma­lign with re­spect to that pre­fix—con­di­tion­ing on that pre­fix gives you a dis­tri­bu­tion al­most en­tirely con­trolled by these ma­lign TMs. The ‘nat­ural’ com­plex­ity of S, or of other strings that share the pre­fix, play al­most no role in their pri­ors.

The above is of course con­di­tional on this ex­haust­ive search be­ing pos­sible, which also re­lies on there be­ing any­one in any uni­verse that ac­tu­ally uses the prior to make de­cisions. Other­wise, we can’t se­lect the pre­fixes that can be messed with.

• This reas­on­ing seems to rely on there be­ing such strings S that are use­ful to pre­dict far out of pro­por­tion to what you would ex­pect from their com­plex­ity. But a de­scrip­tion of the cir­cum­stance in which pre­dict­ing S is so use­ful should it­self give you a way of spe­cify­ing S, so I doubt that this is pos­sible.

• I agree. That’s what I meant when I wrote there will be TMs that ar­ti­fi­cially pro­mote S it­self. However, this would still mean that most of S’s mass in the prior would be due to these TMs, and not due to the nat­ural gen­er­ator of the string.

Fur­ther­more, it’s un­clear how many TMs would pro­mote S vs S’ or other al­tern­at­ives. Be­cause of this, I don’t now whether the prior would be higher for S or S’ from this reas­on­ing alone. Whichever is the case, the prior no longer re­flects mean­ing­ful in­form­a­tion about the uni­verse that gen­er­ates S and whose in­hab­it­ants are us­ing the pre­fix to choose what to do; it’s dom­in­ated by these TMs that search for pre­fixes they can at­tempt to in­flu­ence.

• I didn’t mean that an agenty Tur­ing ma­chine would find S and then de­cide that it wants you to cor­rectly pre­dict S. I meant that to the ex­tent that pre­dict­ing S is com­monly use­ful, there should be a simple un­der­ly­ing reason why it is com­monly use­ful, and this reason should give you a nat­ural way of com­put­ing S that does not have the over­head of any agency that de­cides whether or not it wants you to cor­rectly pre­dict S.

• How many bits do you think it takes to spe­cify the prop­erty “people’s pre­dic­tions about S, us­ing uni­ver­sal prior P, are very im­port­ant”?

(I think you’ll need to spe­cify the uni­ver­sal prior P by ref­er­ence to the uni­ver­sal prior that is ac­tu­ally used in the world con­tain­ing the string S, if you spell out the prior P ex­pli­citly you are already sunk just from the am­bi­gu­ity in the choice of lan­guage.)

It seems re­l­at­ively un­likely to me that this will be cheaper than spe­cify­ing some ar­bit­rary de­gree of free­dom in a com­pu­ta­tion­ally rich uni­verse that life can con­trol (+ the ex­tra log(frac­tion of de­grees of free­dom the con­sequen­tial­ists ac­tu­ally choose to con­trol)). Of course it might.

I agree that the en­tire game is in the con­stants—what is the cheapest way to pick out im­port­ant strings.

• I don’t think that spe­cify­ing the prop­erty of im­port­ance is simple and helps nar­row down S. I think that in or­der for pre­dict­ing S to be im­port­ant, S must be gen­er­ated by a simple pro­cess. Pro­cesses that take large num­bers of bits to spe­cify are cor­res­pond­ingly rarely oc­cur­ring, and thus less use­ful to pre­dict.

• I don’t buy it. A cam­era that some ro­bot is us­ing to make de­cisions is no sim­pler than any other place on Earth, just more im­port­ant.

Clearly you need to e.g. make the an­thropic up­date and do stuff like that be­fore you have any chance of com­pet­ing with the con­sequen­tial­ist. This might just be a quant­it­at­ive dif­fer­ence about how simple is simple—like I said else­where, all the ac­tion is in the ad­dit­ive con­stants, I agree that the im­port­ant things are “simple” in some sense.

• Ok, I see what you’re get­ting at now.

• The part about the reason­ers hav­ing an ar­bit­rary amount of time to think wasn’t ob­vi­ous to me. The TM can run for ar­bit­rar­ily long but if it is sim­u­lat­ing a uni­verse and us­ing the uni­verse to de­term­ine its out­put then the TM needs to spe­cify a sys­tem for read­ing from the uni­verse.

If that sys­tem in­volves a start-to-read time that is long enough for the in-uni­verse life to reason about the uni­ver­sal prior then that time spe­cific­a­tion alone would take a huge num­ber of bits.

On the other hand, I could ima­gine a scheme that looks for a spe­cific short trig­ger se­quence at a par­tic­u­lar spa­tial loc­a­tion then starts read­ing out. If this trig­ger se­quence is un­likely to oc­cur nat­ur­ally then the civil­iz­a­tion would have as long as they want to reason about the prior. So over­all it does seem plaus­ible to me now to al­low for ar­bit­rar­ily long in-uni­verse time.

• The trig­ger se­quence is a cool idea.

I want to add that the in­ten­ded gen­er­ator TM also needs to spe­cify a start-to-read time, so there is sym­metry there. Whatever method a TM needs to use to se­lect the cam­era start time in the in­ten­ded gen­er­ator for the real world samples, it can also use in the sim­u­lated world with alien life, since for the scheme to work only the dif­fer­ence in com­plex­ity between the two mat­ters.

There is ad­di­tional flex in that un­like the in­ten­ded gen­er­ator, the reasoner TM can sample its uni­verse sim­u­la­tion at any cheaply com­put­able in­ter­val, giv­ing the civil­isa­tion the op­tion of choos­ing any amount of think­ing they can per­form between out­puts, if they so choose.