# The Solomonoff Prior is Malign

This ar­gu­ment came to my at­ten­tion from this post by Paul Chris­ti­ano. I also found this clar­ifi­ca­tion helpful. I found these counter-ar­gu­ments stim­u­lat­ing and have in­cluded some dis­cus­sion of them.

Very lit­tle of this con­tent is origi­nal. My con­tri­bu­tions con­sist of flesh­ing out ar­gu­ments and con­struct­ing ex­am­ples.

Thank you to Beth Barnes and Thomas Kwa for helpful dis­cus­sion and com­ments.

# What is the Solomonoff prior?

The Solomonoff prior is in­tended to an­swer the ques­tion “what is the prob­a­bil­ity of X?” for any X, where X is a finite string over some finite alpha­bet. The Solomonoff prior is defined by tak­ing the set of all Tur­ing ma­chines (TMs) which out­put strings when run with no in­put and weight­ing them pro­por­tional to , where is the de­scrip­tion length of the TM (in­for­mally its size in bits).

The Solomonoff prior says the prob­a­bil­ity of a string is the sum over all the weights of all TMs that print that string.

One rea­son to care about the Solomonoff prior is that we can use it to do a form of ideal­ized in­duc­tion. If you have seen 0101 and want to pre­dict the next bit, you can use the Solomonoff prior to get the prob­a­bil­ity of 01010 and 01011. Nor­mal­iz­ing gives you the chances of see­ing 1 ver­sus 0, con­di­tioned on see­ing 0101. In gen­eral, any pro­cess that as­signs prob­a­bil­ities to all strings in a con­sis­tent way can be used to do in­duc­tion in this way.

This post pro­vides more in­for­ma­tion about Solomonoff In­duc­tion.

# Why is it ma­lign?

Imag­ine that you wrote a pro­gram­ming lan­guage called python^10 that works as fol­lows: First, it takes all alpha-nu­meric chars that are not in liter­als and checks if they’re re­peated 10 times se­quen­tially. If they’re not, they get deleted. If they are, they get re­placed by a sin­gle copy. Se­cond, it runs this new pro­gram through a python in­ter­preter.

Hello world in python^10:

pppppppp­prrrrrrrrrriiiiiiiiiinnnnnnnnnntttttttttt(‘Hello, world!’)

Luck­ily, python has an exec func­tion that ex­e­cutes liter­als as code. This lets us write a shorter hello world:

eeeeeeee­exxxxxxxxxxeeeeeeeeeec­c­c­c­c­c­c­ccc(“print(‘Hello, world!’)”)

It’s prob­a­bly easy to see that for nearly ev­ery pro­gram, the short­est way to write it in python^10 is to write it in python and run it with exec. If we didn’t have exec, for suffi­ciently com­pli­cated pro­grams, the short­est way to write them would be to spec­ify an in­ter­preter for a differ­ent lan­guage in python^10 and write it in that lan­guage in­stead.

As this ex­am­ple shows, the an­swer to “what’s the short­est pro­gram that does X?” might in­volve us­ing some round­about method (in this case we used exec). If python^10 has some se­cu­rity prop­er­ties that python didn’t have, then the short­est pro­gram in python^10 that ac­com­plished any given task would not have these se­cu­rity prop­er­ties be­cause they would all pass through exec. In gen­eral, if you can ac­cess al­ter­na­tive ‘modes’ (in this case python), the short­est pro­grams that out­put any given string might go through one of those modes, pos­si­bly in­tro­duc­ing ma­lign be­hav­ior.

Let’s say that I’m try­ing to pre­dict what a hu­man types next us­ing the Solomonoff prior. Many pro­grams pre­dict the hu­man:

1. Si­mu­late the hu­man and their lo­cal sur­round­ings. Run the simu­la­tion for­ward and check what gets typed.

2. Si­mu­late the en­tire Earth. Run the simu­la­tion for­ward and check what that par­tic­u­lar hu­man types.

3. Si­mu­late the en­tire uni­verse from the be­gin­ning of time. Run the simu­la­tion for­ward and check what that par­tic­u­lar hu­man types.

4. Si­mu­late an en­tirely differ­ent uni­verse that has rea­son to simu­late this uni­verse. Out­put what the hu­man types in the simu­la­tion of our uni­verse.

Which one is the sim­plest? One prop­erty of the Sol­monoff prior is that it doesn’t care about how long the TMs take to run, only how large they are. This re­sults in an un­in­tu­itive no­tion of “sim­plic­ity”; a pro­gram that does some­thing times might be sim­pler than a pro­gram that does the same thing times be­cause the num­ber is eas­ier to spec­ify than .

In our ex­am­ple, it seems likely that “simu­late the en­tire uni­verse” is sim­pler than “simu­late Earth” or “simu­late part of Earth” be­cause the ini­tial con­di­tions of the uni­verse are sim­pler than the ini­tial con­di­tions of Earth. There is some ad­di­tional com­plex­ity in pick­ing out the spe­cific hu­man you care about. Since the lo­cal simu­la­tion is built around that hu­man this will be eas­ier in the lo­cal simu­la­tion than the uni­verse simu­la­tion. How­ever, in ag­gre­gate, it seems pos­si­ble that “simu­late the uni­verse, pick out the typ­ing” is the short­est pro­gram that pre­dicts what your hu­man will do next. Even so, “pick out the typ­ing” is likely to be a very com­pli­cated pro­ce­dure, mak­ing your to­tal com­plex­ity quite high.

Whether simu­lat­ing a differ­ent uni­verse that simu­lates our uni­verse is sim­pler de­pends a lot on the prop­er­ties of that other uni­verse. If that other uni­verse is sim­pler than our uni­verse, then we might run into an exec situ­a­tion, where it’s sim­pler to run that other uni­verse and spec­ify the hu­man in their simu­la­tion of our uni­verse.

This is trou­bling be­cause that other uni­verse might con­tain be­ings with differ­ent val­ues than our own. If it’s true that simu­lat­ing that uni­verse is the sim­plest way to pre­dict our hu­man, then some non-triv­ial frac­tion of our pre­dic­tion might be con­trol­led by a simu­la­tion in an­other uni­verse. If these be­ings want us to act in cer­tain ways, they have an in­cen­tive to al­ter their simu­la­tion to change our pre­dic­tions.

At its core, this is the main ar­gu­ment why the Solomonoff prior is ma­lign: a lot of the pro­grams will con­tain agents with prefer­ences, these agents will seek to in­fluence the Solomonoff prior, and they will be able to do so effec­tively.

## How many other uni­verses?

The Solomonoff prior is run­ning all pos­si­ble Tur­ing ma­chines. How many of them are go­ing to simu­late uni­verses? The an­swer is prob­a­bly “quite a lot”.

It seems like spec­i­fy­ing a lawful uni­verse can be done with very few bits. Con­way’s Game of Life is very sim­ple and can lead to very rich out­comes. Ad­di­tion­ally, it seems quite likely that agents with prefer­ences (con­se­quen­tial­ists) will ap­pear some­where in­side this uni­verse. One rea­son to think this is that evolu­tion is a rel­a­tively sim­ple math­e­mat­i­cal reg­u­lar­ity that seems likely to ap­pear in many uni­verses.

If the uni­verse has a hos­pitable struc­ture, due to in­stru­men­tal con­ver­gence these agents with prefer­ences will ex­pand their in­fluence. As the uni­verse runs for longer and longer, the agents will grad­u­ally con­trol more and more.

In ad­di­tion to spec­i­fy­ing how to simu­late the uni­verse, the TM must spec­ify an out­put chan­nel. In the case of Game of Life, this might be a par­tic­u­lar cell sam­pled at a par­tic­u­lar fre­quency. Other ex­am­ples in­clude whether or not a par­tic­u­lar pat­tern is pre­sent in a par­tic­u­lar re­gion, or the par­ity of the to­tal num­ber of cells.

In sum­mary, spec­i­fy­ing lawful uni­verses that give rise to con­se­quen­tial­ists re­quires a very sim­ple pro­gram. There­fore, the pre­dic­tions gen­er­ated by the Solomonoff prior will have some in­fluen­tial com­po­nents com­prised of simu­lated con­se­quen­tial­ists.

## How would they in­fluence the Solomonoff prior?

Con­se­quen­tial­ists that find them­selves in uni­verses can rea­son about the fun­da­men­tal laws that gov­ern their uni­verse. If they find that their uni­verse has rel­a­tively sim­ple physics, they will know that their be­hav­ior con­tributes to the Solomonoff prior. To gain ac­cess to more re­sources in other uni­verses, these con­se­quen­tial­ists might seek to act in ways that in­fluence the Solomonoff prior.

A con­trived ex­am­ple of a de­ci­sion other be­ings would want to ma­nipu­late is “what pro­gram should be writ­ten and ex­e­cuted next?” Be­ings in other uni­verses would have an in­cen­tive to get us to write pro­grams that were al­igned with their val­ues. A par­tic­u­larly in­ter­est­ing sce­nario is one in which they write them­selves into ex­is­tence, al­low­ing them to effec­tively “break into” our uni­verse.

For ex­am­ple, some­where in the Solomonoff prior there is a pro­gram that goes some­thing like: “Si­mu­late this uni­verse. Start­ing from the year 2100, ev­ery hour out­put ‘1’ if there’s a cu­bic me­ter of iron on the Moon, else out­put ‘0’.” By con­trol­ling the pres­ence/​ab­sence of a cu­bic me­ter of iron on the Moon, we would be able to in­fluence the out­put of this par­tic­u­lar facet of the Solomonoff prior.

This ex­am­ple is a very com­pli­cated pro­gram and thus will not have much weight in the Solomonoff prior. How­ever, by rea­son­ing over the com­plex­ity of pos­si­ble out­put chan­nels for their uni­verse, con­se­quen­tial­ists would be able to iden­tify out­put chan­nels that weigh heav­ily in the Solomonoff prior. For ex­am­ple, if I was in the Game of Life, I might rea­son that sam­pling cells that were liv­ing in the ini­tial con­di­tions of the uni­verse is sim­pler than sam­pling other cells. Ad­di­tion­ally, sam­pling cells and re­port­ing their out­puts di­rectly is sim­pler than sam­pling cells and re­vers­ing their val­ues. There­fore, I might choose to con­trol re­gions close to the ini­tial live cells of the uni­verse for pur­poses of ex­ert­ing in­fluence over the Solomonoff prior.

Ad­di­tion­ally, con­se­quen­tial­ists can send across many plau­si­ble chan­nels at once, fo­cus efforts on chan­nels that are easy to con­trol, send through chan­nels that would not un­duly de­crease their val­ues for other rea­sons, etc. Since the pro­por­tional weight in the Solomonoff prior drops off ex­po­nen­tially with re­spect to com­plex­ity, it’s pos­si­ble only very small re­gions of space would need to be used to achieve a large frac­tion of the max­i­mum pos­si­ble in­fluence.

## How much in­fluence will they have?

A sim­ple ap­pli­ca­tion of the no free lunch the­o­rem shows that there is no way of mak­ing pre­dic­tions that is bet­ter than the Solomonoff prior across all pos­si­ble dis­tri­bu­tions over all pos­si­ble strings. Thus, agents that are in­fluenc­ing the Solomonoff prior can­not be good at pre­dict­ing, and thus gain in­fluence, in all pos­si­ble wor­lds.

In prac­tice, this is ir­rele­vant be­cause con­se­quen­tial­ists do not want to achieve equal pre­dic­tive ac­cu­racy in all wor­lds; they only care about wor­lds in which be­ing pre­dic­tive re­sults in in­creased in­fluence. If I’m ex­ert­ing con­trol over the Solomonoff prior, I only care about in­fluenc­ing it in ways that might af­fect ma­te­rial con­se­quences in other uni­verses. For ex­am­ple, I do not care about gain­ing in­fluence in uni­verses in­hos­pitable to life. Thus I will be able to trade off pre­dic­tive power in uni­verses I don’t care about for pre­dic­tive power in uni­verses I do care about. This means that I should re­strict my at­ten­tion to all uni­verses that have re­sources and the Solomonoff prior is be­ing used to make im­por­tant de­ci­sions, weight­ing ap­pro­pri­ately.

Another in­ter­est­ing thing about the Solomonoff prior is it is ac­tu­ally a col­lec­tion of pri­ors. The “length” of a TM is defined rel­a­tive to some uni­ver­sal TM. For any par­tic­u­lar TM, it’s length with re­spect to differ­ent uni­ver­sal TMs will vary; thus, there are ver­sions of the Solomonoff prior that give more or less weight to any given TM. (Note that you can simu­late uni­ver­sal TMs with uni­ver­sal TMs, so the differ­ence across all pro­grams will be bounded by a con­stant fac­tor. How­ever, this con­stant fac­tor can be large, so the differ­ence in rel­a­tive weight be­tween differ­ent Solomonoff pri­ors can also be large).

In par­tic­u­lar, this sug­gests a good strat­egy for con­se­quen­tial­ists: find a uni­verse that is us­ing a ver­sion of the Solomonoff prior that has a very short de­scrip­tion of the par­tic­u­lar uni­verse the con­se­quen­tial­ists find them­selves in.

The com­bined strat­egy is thus to take a dis­tri­bu­tion over all de­ci­sions in­formed by the Solomonoff prior, weight them by how much in­fluence can be gained and the ver­sion of the prior be­ing used, and read off a se­quence of bits that will cause some of these de­ci­sions to re­sult in a preferred out­come.

The ques­tion of how much in­fluence any given uni­verse of con­se­quen­tial­ists will have is difficult to an­swer. One way of quan­tify­ing this is to think about how many “uni­verses they don’t care about” they’re trad­ing off for “uni­verses they do care about” (re­ally we should be think­ing in terms of se­quences, but I find rea­son­ing about uni­verses to be eas­ier).

Since the con­se­quen­tial­ists care about ex­ert­ing max­i­mum in­fluence, we can ap­prox­i­mate them as not car­ing about uni­verses that don’t use a ver­sion of the Solomonoff prior that gives them a large weight. This can be op­er­a­tional­ized as only car­ing about uni­verses that use uni­ver­sal TM in a par­tic­u­lar set for their Solomonoff prior. What is the prob­a­bil­ity that a par­tic­u­lar uni­verse uses a uni­ver­sal TM from that set? I am not sure, but 1/​mil­lion to 1/​billion seems rea­son­able. This sug­gests a uni­verse of con­se­quen­tial­ists will only care about 1/​mil­lion to 1/​billion uni­verses, which means they can de­vote a mil­lion/​billion times the pre­dic­tive power to uni­verses they care about. This is some­times called the “an­thropic up­date”. (This post con­tains more dis­cus­sion about this par­tic­u­lar ar­gu­ment.)

Ad­di­tion­ally, we might think about which de­ci­sions the con­se­quen­tial­ists would care about. If a par­tic­u­lar de­ci­sion us­ing the Solomonoff prior is im­por­tant, con­se­quen­tial­ists are go­ing to care more about that de­ci­sion than other de­ci­sions. Con­ser­va­tively, per­haps 1/​1000 de­ci­sions are “im­por­tant” in this sense, giv­ing an­other 1000x rel­a­tive weight­ing.

After you con­di­tion on a de­ci­sion be­ing im­por­tant and us­ing a par­tic­u­lar ver­sion of the Solomonoff prior, it thus seems quite likely that a non-triv­ial frac­tion of your prior is be­ing con­trol­led by con­se­quen­tial­ists.

An in­tu­ition pump is that this ar­gu­ment is closer to an ex­is­tence claim than a for-all claim. The Solomonoff prior is ma­lign if there ex­ists a sim­ple uni­verse of con­se­quen­tial­ists that wants to in­fluence our uni­verse. This uni­verse need not be sim­ple in an ab­solute sense, only sim­ple rel­a­tive to the other TMs that could equal it in pre­dic­tive power. Even if most con­se­quen­tial­ists are too com­pli­cated or not in­ter­ested, it seems likely that there is at least one uni­verse that is.

## Example

Com­plex­ity of Consequentialists

How many bits does it take to spec­ify a uni­verse that can give rise to con­se­quen­tial­ists? I do not know, but it seems like Con­way’s Game of Life might provide a rea­son­able lower bound.

Luck­ily, the code golf com­mu­nity has spent some amount of effort op­ti­miz­ing for pro­gram size. How many bytes would you guess it takes to spec­ify Game of Life? Well, it de­pends on the uni­ver­sal TM. Pos­si­ble an­swers in­clude 6, 32, 39, or 96.

Since uni­verses of con­se­quen­tial­ists can “cheat” by con­cen­trat­ing their pre­dic­tive efforts onto uni­ver­sal TMs in which they are par­tic­u­larly sim­ple, we’ll take the min­i­mum. Ad­di­tion­ally, my friend who’s into code golf (he wrote the 96-byte solu­tion!) says that the 6-byte an­swer ac­tu­ally con­tains closer to 4 bytes of in­for­ma­tion.

To spec­ify an ini­tial con­figu­ra­tion that can give rise to con­se­quen­tial­ists we will need to provide more in­for­ma­tion. The small­est in­finite growth pat­tern in Game of Life has been shown to need 10 cells. Another refer­ence point is that a self-repli­ca­tor with 12 cells ex­ists in HighLife, a Game of Life var­i­ant. I’m not an ex­pert, but I think an ini­tial con­figu­ra­tion that gives rise to in­tel­li­gent life can be speci­fied in an 8x8 bound­ing box, giv­ing a to­tal of 8 bytes.

Fi­nally, we need to spec­ify a sam­pling pro­ce­dure that con­se­quen­tial­ists can gain con­trol of. Some­thing like “read <cell> ev­ery <large num­ber> time ticks” suffices. By as­sump­tion, the cell be­ing sam­pled takes al­most no in­for­ma­tion to spec­ify. We can also choose what­ever large num­ber is eas­iest to spec­ify (the busy beaver num­bers come to mind). In to­tal, I don’t think this will take more than 2 bytes.

Sum­ming up, Game of Life + ini­tial con­figu­ra­tion + sam­pling method takes maybe 16 bytes, so a rea­son­able range for the com­plex­ity of a uni­verse of con­se­quen­tial­ists might be 10-1000 bytes. That doesn’t seem like very many, es­pe­cially rel­a­tive to the amount of in­for­ma­tion we’ll be con­di­tion­ing the Solomonoff prior on if we ever use it to make an im­por­tant de­ci­sion.

Com­plex­ity of Conditioning

When we’re us­ing the Solomonoff prior to make an im­por­tant de­ci­sion, the ob­ser­va­tions we’ll con­di­tion on in­clude in­for­ma­tion that:

1. We’re us­ing the Solomonoff prior

2. We’re mak­ing an im­por­tant decision

3. We’re us­ing some par­tic­u­lar uni­ver­sal TM

How much in­for­ma­tion will this in­clude? Many pro­grams will not simu­late uni­verses. Many uni­verses ex­ist that do not have ob­servers. Among uni­verses with ob­servers, some will not de­velop the Solomonoff prior. Th­ese ob­servers will make many de­ci­sions. Very few of these de­ci­sions will be im­por­tant. Even fewer of these de­ci­sions are made with the Solomonoff prior. Even fewer will use the par­tic­u­lar ver­sion of the Solomonoff prior that gets used.

It seems rea­son­able to say that this is at least a megabyte of raw in­for­ma­tion, or about a mil­lion bytes. (I ac­knowl­edge some cart-horse is­sues here.)

This means that af­ter you con­di­tion your Solomonoff prior, you’ll be left with pro­grams that are at least a mil­lion bytes. As our Game of Life ex­am­ple shows, it only takes maybe 10-1000 of these bytes to spec­ify a uni­verse that gives rise to con­se­quen­tial­ists. You have ap­prox­i­mately a mil­lion bytes left to spec­ify more prop­er­ties of the uni­verse that will make it more likely the con­se­quen­tial­ists will want to ex­ert in­fluence over the Solomonoff prior for the pur­pose of in­fluenc­ing this par­tic­u­lar de­ci­sion.

# Why might this ar­gu­ment be wrong?

## Inac­cessible Channels

Argument

Most of the uni­verse is out­side of hu­man­ity’s light-cone. This might sug­gest that most “sim­ple” ways to sam­ple from our uni­verse are cur­rently out­side our in­fluence, mean­ing that the only por­tions of the Solomonoff prior we can con­trol are go­ing to have an ex­tremely low weight.

In gen­eral, it might be the case that for any uni­verse, con­se­quen­tial­ists in­side that uni­verse are go­ing to have difficulty con­trol­ling sim­ple out­put chan­nels. For ex­am­ple, in Game of Life, a sim­ple way to read in­for­ma­tion might sam­ple a cell par­tic­u­lar cell start­ing at t=0. How­ever, con­se­quen­tial­ists in Game of Life will not ap­pear un­til a much later time and will be un­able to con­trol a large ini­tial chunk of that out­put chan­nel.

Counter-argument

Paul Chris­ti­ano points out that the gen­eral form of this ar­gu­ment also ap­plies to other TMs that com­pose of your Solomonoff prior. For ex­am­ple, when pre­dict­ing what I’ll type next, you would “want” to simu­late me and pre­dict what I would type start­ing at some time T. How­ever, this is a pretty com­pli­cated way of sam­pling. The fact that sim­ple sam­pling pro­ce­dures are less pre­dic­tive doesn’t asym­met­ri­cally pe­nal­ize con­se­quen­tial­ists. The con­se­quen­tial­ists uni­verse and sam­pling method only have to be sim­ple rel­a­tive to other pro­grams that are equally good at pre­dict­ing.

One might also note that large num­bers can be pro­duced with rel­a­tively few bits, so “sam­ple start­ing at <large num­ber>” is not much more com­pli­cated than “sam­ple start­ing at 0″.

## Speedy Channels

Argument

There are many sim­ple ways of sam­pling from uni­verses very quickly. For ex­am­ple, in Game of Life, one can sam­ple a cell ev­ery time-tick. It seems fea­si­ble for con­se­quen­tial­ists to simu­late Earth in the Game of Life, but not fea­si­ble to simu­late Earth such that they can al­ter a spe­cific cell ev­ery time tick per the simu­la­tion.

Counter-argument

Con­se­quen­tial­ists in the Game of Life could sim­ply simu­late Earth, com­pute the pre­dic­tions, then later broad­cast them along very fast sam­pling chan­nels. How­ever, it might be the case that build­ing a ma­chine that al­ters a cell ar­bi­trar­ily ev­ery time tick is im­pos­si­ble. In our uni­verse, there might be sam­ple pro­ce­dures that physics does not per­mit us to ex­ert ar­bi­trary con­trol over, e.g. due to speed of light limi­ta­tions. If this is the case, con­se­quen­tial­ists will di­rect efforts to­wards the sim­plest chan­nel they can con­trol.

## Com­pu­ta­tional Burden

Argument

Deter­min­ing how to prop­erly in­fluence the Solomonoff prior re­quires mas­sive com­pu­ta­tion re­sources de­voted to simu­lat­ing other uni­verses and how they’re go­ing to use the Solomonoff prior. While the Solomonoff prior does not pe­nal­ize ex­tremely long run-times, from the per­spec­tive of the con­se­quen­tial­ists do­ing the simu­lat­ing, run-times will mat­ter. In par­tic­u­lar, con­se­quen­tial­ists will likely be able to use com­pute to achieve things they value (like we are ca­pa­ble of do­ing). There­fore, it would be ex­tremely costly to ex­ert in­fluence over the Solomonoff prior, po­ten­tially to the point where con­se­quen­tial­ists will choose not to do so.

Counter-argument

The com­pu­ta­tional bur­den of pre­dict­ing the use of the Solomonoff in other uni­verses is an em­piri­cal ques­tion. Since it’s a rel­a­tively fixed cost and there are many other uni­verses, con­se­quen­tial­ists might rea­son that the marginal in­fluence over these other uni­verses is worth the com­pute. Is­sues might arise if the use of the Solomonoff prior in other uni­verses is very sen­si­tive to pre­cise his­tor­i­cal data, which would re­quire a very pre­cise simu­la­tion to in­fluence, in­creas­ing the com­pu­ta­tional bur­den.

Ad­di­tion­ally, some uni­verses will find them­selves with more com­put­ing power than other uni­verses. Uni­verses with a lot of com­put­ing power might find it rel­a­tively easy to pre­dict the use of the Solomonoff prior in sim­pler uni­verses and sub­se­quently ex­ert in­fluence over them.

## Mal­ign im­plies complex

Argument

A pre­dic­tor that cor­rectly pre­dicts the first N bits of a se­quence then switches to be­ing ma­lign will be strictly more com­pli­cated than a pre­dic­tor that doesn’t switch to be­ing ma­lign. There­fore, while con­se­quen­tial­ists in other uni­verses might have some in­fluence over the Solomonoff prior, they will be dom­i­nated by non-ma­lign pre­dic­tors.

Counter-argument

This ar­gu­ment makes a mis­taken as­sump­tion that the ma­lign in­fluence on the Solomonoff prior is in the form of pro­grams that have their “ma­lign­ness” as part of the pro­gram. The ar­gu­ment given sug­gests that simu­lated con­se­quen­tial­ists will have an in­stru­men­tal rea­son to be pow­er­ful pre­dic­tors. Th­ese simu­lated con­se­quen­tial­ists have rea­soned about the Solomonoff prior and are ex­e­cut­ing the strat­egy of “be good at pre­dict­ing, then ex­ert ma­lign in­fluence”, but this strat­egy is not hard­coded so ex­ert­ing ma­lign in­fluence does not add com­plex­ity.

## Cancel­ing Influence

Argument

If it’s true that many con­se­quen­tial­ists are try­ing to in­fluence the Solomonoff prior, then one might ex­pect the in­fluence to can­cel out. It’s im­prob­a­ble that all the con­se­quen­tial­ists have the same prefer­ences; on av­er­age, there should be an equal num­ber of con­se­quen­tial­ists try­ing to in­fluence any given de­ci­sion in any given di­rec­tion. Since the con­se­quen­tial­ists them­selves can rea­son thus, they will re­al­ize that the ex­pected amount of in­fluence is ex­tremely low, so they will not at­tempt to ex­ert in­fluence at all. Even if some of the con­se­quen­tial­ists try to ex­ert in­fluence any­way, we should ex­pect the in­fluence of these con­se­quen­tial­ists to can­cel out also.

Counter-argument

Since the weight of a civ­i­liza­tion of con­se­quen­tial­ists in the Solomonoff prior is pe­nal­ized ex­po­nen­tially with re­spect to com­plex­ity, it might be the case that for any given ver­sion of the Solomonoff prior, most of the in­fluence is dom­i­nated by one sim­ple uni­verse. Differ­ent val­ues of con­se­quen­tial­ists im­ply that they care about differ­ent de­ci­sions, so for any given de­ci­sion, it might be that very few uni­verses of con­se­quen­tial­ists are both sim­ple enough that they have enough in­fluence and care about that de­ci­sion.

Even if for any given de­ci­sion, there are always 100 uni­verses with equal in­fluence and differ­ing prefer­ences, there are strate­gies that they might use to ex­ert in­fluence any­way. One sim­ple strat­egy is for each uni­verse to ex­ert in­fluence with a 1% chance, giv­ing ev­ery uni­verse 1100 of the re­sources in ex­pec­ta­tion. If the re­sources ac­cessible are vast enough, then this might be a good deal for the con­se­quen­tial­ists. Con­se­quen­tial­ists would not defect against each other for the rea­sons that mo­ti­vate func­tional de­ci­sion the­ory.

More ex­otic solu­tions to this co­or­di­na­tion prob­lem in­clude acausal trade amongst uni­verses of differ­ent con­se­quen­tial­ists to form col­lec­tives that ex­ert in­fluence in a par­tic­u­lar di­rec­tion.

Be warned that this leads to much weird­ness.

# Conclusion

The Solomonoff prior is very strange. Agents that make de­ci­sions us­ing the Solomonoff prior are likely to be sub­ject to in­fluence from con­se­quen­tial­ists in simu­lated uni­verses. Since it is difficult to com­pute the Solomonoff prior, this fact might not be rele­vant in the real world.

How­ever, Paul Chris­ti­ano ap­plies roughly the same ar­gu­ment to claim that the im­plicit prior used in neu­ral net­works is also likely to gen­er­al­ize catas­troph­i­cally. (See Learn­ing the prior for a po­ten­tial way to tackle this prob­lem).

Warn­ing: highly ex­per­i­men­tal in­ter­est­ing spec­u­la­tion.

## Un­im­por­tant Decisions

Con­se­quen­tial­ists have a clear mo­tive to ex­ert in­fluence over im­por­tant de­ci­sions. What about unim­por­tant de­ci­sions?

The gen­eral form of the above ar­gu­ment says: “for any given pre­dic­tion task, the pro­grams that do best are dis­pro­por­tionately likely to be con­se­quen­tial­ists that want to do well at the task”. For im­por­tant de­ci­sions, many con­se­quen­tial­ists would in­stru­men­tally want to do well at the task. How­ever, for unim­por­tant de­ci­sions, there might be con­se­quen­tial­ists that want to make good pre­dic­tions. Th­ese con­se­quen­tial­ists would still be able to con­cen­trate efforts on ver­sions of the Solomonoff prior that weighted them es­pe­cially high, so they might out­perform other pro­grams in the long run.

It’s un­clear to me whether or not this be­hav­ior would be ma­lign. One rea­son why it might be ma­lign is that these con­se­quen­tial­ists that care about pre­dic­tions would want to make our uni­verse more pre­dictable. How­ever, while I am rel­a­tively con­fi­dent that ar­gu­ments about in­stru­men­tal con­ver­gence should hold, spec­u­lat­ing about pos­si­ble prefer­ences of simu­lated con­se­quen­tial­ists seems likely to pro­duce er­rors in rea­son­ing.

## Hail mary

Paul Chris­ti­ano sug­gests that hu­man­ity was des­per­ate enough to want to throw a “hail mary”, one way to do this is to use the Solomonoff prior to con­struct a util­ity func­tion that will con­trol the en­tire fu­ture. Since this is a very im­por­tant de­ci­sion, we ex­pect con­se­quen­tial­ists in the Solomonoff prior to care about in­fluenc­ing this de­ci­sion. There­fore, the re­sult­ing util­ity func­tion is likely to rep­re­sent some simu­lated uni­verse.

If ar­gu­ments about acausal trade and value hand­shakes hold, then the re­sult­ing util­ity func­tion might con­tain some frac­tion of hu­man val­ues. Again, this leads to much weird­ness in many ways.

## Speed prior

One rea­son that the Solomonoff prior con­tains simu­lated con­se­quen­tial­ists is that its no­tion of com­plex­ity does not pe­nal­ize run­time com­plex­ity, so very sim­ple pro­grams are al­lowed to perform mas­sive amounts of com­pu­ta­tion. The speed prior at­tempts to re­solve this is­sue by pe­nal­iz­ing pro­grams by an ad­di­tional log­a­r­ithm of the amount of time for which it’s run.

The speed prior might re­duce the rel­a­tive weight­ing of uni­verses with con­se­quen­tial­ists be­cause such pro­grams have to be run for a very long time be­fore they start pro­duc­ing rea­son­able pre­dic­tions. The con­se­quen­tial­ists have to gain con­trol of their uni­verse, un­der­stand their fun­da­men­tal laws of physics, simu­late other uni­verses, then ma­nipu­late the speed prior. This might all take a very long time, caus­ing con­se­quen­tial­ists to be dom­i­nated by other pro­grams.

In gen­eral, pe­nal­iz­ing slow­ness might cause pro­grams to “waste” less time on simu­lat­ing con­se­quen­tial­ists, de­vot­ing more com­pu­ta­tion to­wards perform­ing the pre­dic­tion task. How­ever, it still might be the case that a uni­verse that has very fast con­se­quen­tial­ists might still end up dom­i­nat­ing.

Evan Hub­inger also pro­vides an ar­gu­ment that even very fast pro­grams are pos­si­bly ma­lign. The key in­sight is that even though your pro­gram needs to be fast, it can be run­ning pro­grams that are them­selves less fast. For ex­am­ple, one pos­si­ble fast way to solve a clas­sifi­ca­tion prob­lem is to search to find a neu­ral net­work, then use that net­work for in­fer­ence. How­ever, if you wanted your search to find a fast neu­ral net­work, then the search it­self might take longer, re­sult­ing in a net in­crease in speed. Thus, time “waste” can man­i­fest in pro­grams that were ex­plic­itly op­ti­mized to not “waste” time. This “wasted” time could po­ten­tially be used for ma­lign op­ti­miza­tion, in this case for gain­ing in­fluence over the speed prior.

## Ran­dom­ized prior

A po­ten­tial way to re­duce the in­fluence con­se­quen­tial­ists have on a de­ci­sion made by the Solomonoff prior is to ran­dom­ize the par­tic­u­lar ver­sion of the prior that gets used. For ex­am­ple, we might make the par­tic­u­lar uni­ver­sal TM we use de­pen­dent on very pre­cise his­tor­i­cal data. Thus, it would be very costly for con­se­quen­tial­ists to simu­late this his­tory, and thus costly to pre­dict which form of the Solomonoff prior we used.

If con­se­quen­tial­ists can’t pre­dict which Solomonoff prior we are go­ing to use, no par­tic­u­lar civ­i­liza­tion of con­se­quen­tial­ists will have the large ad­van­tage con­ferred by the an­thropic up­date. There­fore, one might hope that all civ­i­liza­tions of con­se­quen­tial­ists will not care about that par­tic­u­lar de­ci­sion.

This ar­gu­ment makes a cou­ple of as­sump­tions. First, it as­sumes that simu­lat­ing very pre­cise his­to­ries is difficult; it might not be difficult for all uni­verses. Se­cond, it as­sumes that the uni­verses through which in­fluence is spread can­not co­or­di­nate, which might be pos­si­ble for through acausal means.

## Sym­me­try considerations

The way that hu­man­ity rea­sons is ev­i­dence for the way that con­se­quen­tial­ists in other uni­verses will rea­son. If hu­man­ity rea­sons that the Solomonoff prior is ma­lign and there­fore is un­will­ing to use it to make de­ci­sions, then con­se­quen­tial­ists in other uni­verses might do like­wise. Th­ese uni­verses would not use the Solomonoff prior to make de­ci­sions.

The re­sult­ing state is that ev­ery­one is wor­ried about the Solomonoff prior be­ing ma­lign, so no one uses it. This means that no uni­verse will want to use re­sources try­ing to in­fluence the Solomonoff prior; they aren’t in­fluenc­ing any­thing.

This sym­me­try ob­vi­ously breaks if there are uni­verses that do not re­al­ize that the Solomonoff prior is ma­lign or can­not co­or­di­nate to avoid its use. One pos­si­ble way this might hap­pen is if a uni­verse had ac­cess to ex­tremely large amounts of com­pute (from the sub­jec­tive ex­pe­rience of the con­se­quen­tial­ists). In this uni­verse, the mo­ment some­one dis­cov­ered the Solomonoff prior, it might be fea­si­ble to start mak­ing de­ci­sions based on a close ap­prox­i­ma­tion.

## Recursion

Uni­verses that use the Solomonoff prior to make im­por­tant de­ci­sions might be taken over by con­se­quen­tial­ists in other uni­verses. A nat­u­ral thing for these con­se­quen­tial­ists to do is to use their po­si­tion in this new uni­verse to also ex­ert in­fluence on the Solomonoff prior. As con­se­quen­tial­ists take over more uni­verses, they have more uni­verses through which to in­fluence the Solomonoff prior, al­low­ing them to take over more uni­verses.

In the limit, it might be that for any fixed ver­sion of the Solomonoff prior, most of the in­fluence is wielded by the sim­plest con­se­quen­tial­ists ac­cord­ing to that prior. How­ever, since com­plex­ity is pe­nal­ized ex­po­nen­tially, gain­ing con­trol of ad­di­tional uni­verses does not in­crease your rel­a­tive in­fluence over the prior by that much. I think this cu­mu­la­tive re­cur­sive effect might be quite strong, or might amount to noth­ing.