# The Solomonoff Prior is Malign

This ar­gu­ment came to my at­ten­tion from this post by Paul Chris­ti­ano. I also found this clar­ifi­ca­tion helpful. I found these counter-ar­gu­ments stim­u­lat­ing and have in­cluded some dis­cus­sion of them.

Very lit­tle of this con­tent is origi­nal. My con­tri­bu­tions con­sist of flesh­ing out ar­gu­ments and con­struct­ing ex­am­ples.

Thank you to Beth Barnes and Thomas Kwa for helpful dis­cus­sion and com­ments.

# What is the Solomonoff prior?

The Solomonoff prior is in­tended to an­swer the ques­tion “what is the prob­a­bil­ity of X?” for any X, where X is a finite string over some finite alpha­bet. The Solomonoff prior is defined by tak­ing the set of all Tur­ing ma­chines (TMs) which out­put strings when run with no in­put and weight­ing them pro­por­tional to , where is the de­scrip­tion length of the TM (in­for­mally its size in bits).

The Solomonoff prior says the prob­a­bil­ity of a string is the sum over all the weights of all TMs that print that string.

One rea­son to care about the Solomonoff prior is that we can use it to do a form of ideal­ized in­duc­tion. If you have seen 0101 and want to pre­dict the next bit, you can use the Solomonoff prior to get the prob­a­bil­ity of 01010 and 01011. Nor­mal­iz­ing gives you the chances of see­ing 1 ver­sus 0, con­di­tioned on see­ing 0101. In gen­eral, any pro­cess that as­signs prob­a­bil­ities to all strings in a con­sis­tent way can be used to do in­duc­tion in this way.

This post pro­vides more in­for­ma­tion about Solomonoff In­duc­tion.

# Why is it ma­lign?

Imag­ine that you wrote a pro­gram­ming lan­guage called python^10 that works as fol­lows: First, it takes all alpha-nu­meric chars that are not in liter­als and checks if they’re re­peated 10 times se­quen­tially. If they’re not, they get deleted. If they are, they get re­placed by a sin­gle copy. Se­cond, it runs this new pro­gram through a python in­ter­preter.

Hello world in python^10:

pppppppp­prrrrrrrrrriiiiiiiiiinnnnnnnnnntttttttttt(‘Hello, world!’)

Luck­ily, python has an exec func­tion that ex­e­cutes liter­als as code. This lets us write a shorter hello world:

eeeeeeee­exxxxxxxxxxeeeeeeeeeec­c­c­c­c­c­c­ccc(“print(‘Hello, world!’)”)

It’s prob­a­bly easy to see that for nearly ev­ery pro­gram, the short­est way to write it in python^10 is to write it in python and run it with exec. If we didn’t have exec, for suffi­ciently com­pli­cated pro­grams, the short­est way to write them would be to spec­ify an in­ter­preter for a differ­ent lan­guage in python^10 and write it in that lan­guage in­stead.

As this ex­am­ple shows, the an­swer to “what’s the short­est pro­gram that does X?” might in­volve us­ing some round­about method (in this case we used exec). If python^10 has some se­cu­rity prop­er­ties that python didn’t have, then the short­est pro­gram in python^10 that ac­com­plished any given task would not have these se­cu­rity prop­er­ties be­cause they would all pass through exec. In gen­eral, if you can ac­cess al­ter­na­tive ‘modes’ (in this case python), the short­est pro­grams that out­put any given string might go through one of those modes, pos­si­bly in­tro­duc­ing ma­lign be­hav­ior.

Let’s say that I’m try­ing to pre­dict what a hu­man types next us­ing the Solomonoff prior. Many pro­grams pre­dict the hu­man:

1. Si­mu­late the hu­man and their lo­cal sur­round­ings. Run the simu­la­tion for­ward and check what gets typed.

2. Si­mu­late the en­tire Earth. Run the simu­la­tion for­ward and check what that par­tic­u­lar hu­man types.

3. Si­mu­late the en­tire uni­verse from the be­gin­ning of time. Run the simu­la­tion for­ward and check what that par­tic­u­lar hu­man types.

4. Si­mu­late an en­tirely differ­ent uni­verse that has rea­son to simu­late this uni­verse. Out­put what the hu­man types in the simu­la­tion of our uni­verse.

Which one is the sim­plest? One prop­erty of the Sol­monoff prior is that it doesn’t care about how long the TMs take to run, only how large they are. This re­sults in an un­in­tu­itive no­tion of “sim­plic­ity”; a pro­gram that does some­thing times might be sim­pler than a pro­gram that does the same thing times be­cause the num­ber is eas­ier to spec­ify than .

In our ex­am­ple, it seems likely that “simu­late the en­tire uni­verse” is sim­pler than “simu­late Earth” or “simu­late part of Earth” be­cause the ini­tial con­di­tions of the uni­verse are sim­pler than the ini­tial con­di­tions of Earth. There is some ad­di­tional com­plex­ity in pick­ing out the spe­cific hu­man you care about. Since the lo­cal simu­la­tion is built around that hu­man this will be eas­ier in the lo­cal simu­la­tion than the uni­verse simu­la­tion. How­ever, in ag­gre­gate, it seems pos­si­ble that “simu­late the uni­verse, pick out the typ­ing” is the short­est pro­gram that pre­dicts what your hu­man will do next. Even so, “pick out the typ­ing” is likely to be a very com­pli­cated pro­ce­dure, mak­ing your to­tal com­plex­ity quite high.

Whether simu­lat­ing a differ­ent uni­verse that simu­lates our uni­verse is sim­pler de­pends a lot on the prop­er­ties of that other uni­verse. If that other uni­verse is sim­pler than our uni­verse, then we might run into an exec situ­a­tion, where it’s sim­pler to run that other uni­verse and spec­ify the hu­man in their simu­la­tion of our uni­verse.

This is trou­bling be­cause that other uni­verse might con­tain be­ings with differ­ent val­ues than our own. If it’s true that simu­lat­ing that uni­verse is the sim­plest way to pre­dict our hu­man, then some non-triv­ial frac­tion of our pre­dic­tion might be con­trol­led by a simu­la­tion in an­other uni­verse. If these be­ings want us to act in cer­tain ways, they have an in­cen­tive to al­ter their simu­la­tion to change our pre­dic­tions.

At its core, this is the main ar­gu­ment why the Solomonoff prior is ma­lign: a lot of the pro­grams will con­tain agents with prefer­ences, these agents will seek to in­fluence the Solomonoff prior, and they will be able to do so effec­tively.

## How many other uni­verses?

The Solomonoff prior is run­ning all pos­si­ble Tur­ing ma­chines. How many of them are go­ing to simu­late uni­verses? The an­swer is prob­a­bly “quite a lot”.

It seems like spec­i­fy­ing a lawful uni­verse can be done with very few bits. Con­way’s Game of Life is very sim­ple and can lead to very rich out­comes. Ad­di­tion­ally, it seems quite likely that agents with prefer­ences (con­se­quen­tial­ists) will ap­pear some­where in­side this uni­verse. One rea­son to think this is that evolu­tion is a rel­a­tively sim­ple math­e­mat­i­cal reg­u­lar­ity that seems likely to ap­pear in many uni­verses.

If the uni­verse has a hos­pitable struc­ture, due to in­stru­men­tal con­ver­gence these agents with prefer­ences will ex­pand their in­fluence. As the uni­verse runs for longer and longer, the agents will grad­u­ally con­trol more and more.

In ad­di­tion to spec­i­fy­ing how to simu­late the uni­verse, the TM must spec­ify an out­put chan­nel. In the case of Game of Life, this might be a par­tic­u­lar cell sam­pled at a par­tic­u­lar fre­quency. Other ex­am­ples in­clude whether or not a par­tic­u­lar pat­tern is pre­sent in a par­tic­u­lar re­gion, or the par­ity of the to­tal num­ber of cells.

In sum­mary, spec­i­fy­ing lawful uni­verses that give rise to con­se­quen­tial­ists re­quires a very sim­ple pro­gram. There­fore, the pre­dic­tions gen­er­ated by the Solomonoff prior will have some in­fluen­tial com­po­nents com­prised of simu­lated con­se­quen­tial­ists.

## How would they in­fluence the Solomonoff prior?

Con­se­quen­tial­ists that find them­selves in uni­verses can rea­son about the fun­da­men­tal laws that gov­ern their uni­verse. If they find that their uni­verse has rel­a­tively sim­ple physics, they will know that their be­hav­ior con­tributes to the Solomonoff prior. To gain ac­cess to more re­sources in other uni­verses, these con­se­quen­tial­ists might seek to act in ways that in­fluence the Solomonoff prior.

A con­trived ex­am­ple of a de­ci­sion other be­ings would want to ma­nipu­late is “what pro­gram should be writ­ten and ex­e­cuted next?” Be­ings in other uni­verses would have an in­cen­tive to get us to write pro­grams that were al­igned with their val­ues. A par­tic­u­larly in­ter­est­ing sce­nario is one in which they write them­selves into ex­is­tence, al­low­ing them to effec­tively “break into” our uni­verse.

For ex­am­ple, some­where in the Solomonoff prior there is a pro­gram that goes some­thing like: “Si­mu­late this uni­verse. Start­ing from the year 2100, ev­ery hour out­put ‘1’ if there’s a cu­bic me­ter of iron on the Moon, else out­put ‘0’.” By con­trol­ling the pres­ence/​ab­sence of a cu­bic me­ter of iron on the Moon, we would be able to in­fluence the out­put of this par­tic­u­lar facet of the Solomonoff prior.

This ex­am­ple is a very com­pli­cated pro­gram and thus will not have much weight in the Solomonoff prior. How­ever, by rea­son­ing over the com­plex­ity of pos­si­ble out­put chan­nels for their uni­verse, con­se­quen­tial­ists would be able to iden­tify out­put chan­nels that weigh heav­ily in the Solomonoff prior. For ex­am­ple, if I was in the Game of Life, I might rea­son that sam­pling cells that were liv­ing in the ini­tial con­di­tions of the uni­verse is sim­pler than sam­pling other cells. Ad­di­tion­ally, sam­pling cells and re­port­ing their out­puts di­rectly is sim­pler than sam­pling cells and re­vers­ing their val­ues. There­fore, I might choose to con­trol re­gions close to the ini­tial live cells of the uni­verse for pur­poses of ex­ert­ing in­fluence over the Solomonoff prior.

Ad­di­tion­ally, con­se­quen­tial­ists can send across many plau­si­ble chan­nels at once, fo­cus efforts on chan­nels that are easy to con­trol, send through chan­nels that would not un­duly de­crease their val­ues for other rea­sons, etc. Since the pro­por­tional weight in the Solomonoff prior drops off ex­po­nen­tially with re­spect to com­plex­ity, it’s pos­si­ble only very small re­gions of space would need to be used to achieve a large frac­tion of the max­i­mum pos­si­ble in­fluence.

## How much in­fluence will they have?

A sim­ple ap­pli­ca­tion of the no free lunch the­o­rem shows that there is no way of mak­ing pre­dic­tions that is bet­ter than the Solomonoff prior across all pos­si­ble dis­tri­bu­tions over all pos­si­ble strings. Thus, agents that are in­fluenc­ing the Solomonoff prior can­not be good at pre­dict­ing, and thus gain in­fluence, in all pos­si­ble wor­lds.

In prac­tice, this is ir­rele­vant be­cause con­se­quen­tial­ists do not want to achieve equal pre­dic­tive ac­cu­racy in all wor­lds; they only care about wor­lds in which be­ing pre­dic­tive re­sults in in­creased in­fluence. If I’m ex­ert­ing con­trol over the Solomonoff prior, I only care about in­fluenc­ing it in ways that might af­fect ma­te­rial con­se­quences in other uni­verses. For ex­am­ple, I do not care about gain­ing in­fluence in uni­verses in­hos­pitable to life. Thus I will be able to trade off pre­dic­tive power in uni­verses I don’t care about for pre­dic­tive power in uni­verses I do care about. This means that I should re­strict my at­ten­tion to all uni­verses that have re­sources and the Solomonoff prior is be­ing used to make im­por­tant de­ci­sions, weight­ing ap­pro­pri­ately.

Another in­ter­est­ing thing about the Solomonoff prior is it is ac­tu­ally a col­lec­tion of pri­ors. The “length” of a TM is defined rel­a­tive to some uni­ver­sal TM. For any par­tic­u­lar TM, it’s length with re­spect to differ­ent uni­ver­sal TMs will vary; thus, there are ver­sions of the Solomonoff prior that give more or less weight to any given TM. (Note that you can simu­late uni­ver­sal TMs with uni­ver­sal TMs, so the differ­ence across all pro­grams will be bounded by a con­stant fac­tor. How­ever, this con­stant fac­tor can be large, so the differ­ence in rel­a­tive weight be­tween differ­ent Solomonoff pri­ors can also be large).

In par­tic­u­lar, this sug­gests a good strat­egy for con­se­quen­tial­ists: find a uni­verse that is us­ing a ver­sion of the Solomonoff prior that has a very short de­scrip­tion of the par­tic­u­lar uni­verse the con­se­quen­tial­ists find them­selves in.

The com­bined strat­egy is thus to take a dis­tri­bu­tion over all de­ci­sions in­formed by the Solomonoff prior, weight them by how much in­fluence can be gained and the ver­sion of the prior be­ing used, and read off a se­quence of bits that will cause some of these de­ci­sions to re­sult in a preferred out­come.

The ques­tion of how much in­fluence any given uni­verse of con­se­quen­tial­ists will have is difficult to an­swer. One way of quan­tify­ing this is to think about how many “uni­verses they don’t care about” they’re trad­ing off for “uni­verses they do care about” (re­ally we should be think­ing in terms of se­quences, but I find rea­son­ing about uni­verses to be eas­ier).

Since the con­se­quen­tial­ists care about ex­ert­ing max­i­mum in­fluence, we can ap­prox­i­mate them as not car­ing about uni­verses that don’t use a ver­sion of the Solomonoff prior that gives them a large weight. This can be op­er­a­tional­ized as only car­ing about uni­verses that use uni­ver­sal TM in a par­tic­u­lar set for their Solomonoff prior. What is the prob­a­bil­ity that a par­tic­u­lar uni­verse uses a uni­ver­sal TM from that set? I am not sure, but 1/​mil­lion to 1/​billion seems rea­son­able. This sug­gests a uni­verse of con­se­quen­tial­ists will only care about 1/​mil­lion to 1/​billion uni­verses, which means they can de­vote a mil­lion/​billion times the pre­dic­tive power to uni­verses they care about. This is some­times called the “an­thropic up­date”. (This post con­tains more dis­cus­sion about this par­tic­u­lar ar­gu­ment.)

Ad­di­tion­ally, we might think about which de­ci­sions the con­se­quen­tial­ists would care about. If a par­tic­u­lar de­ci­sion us­ing the Solomonoff prior is im­por­tant, con­se­quen­tial­ists are go­ing to care more about that de­ci­sion than other de­ci­sions. Con­ser­va­tively, per­haps 1/​1000 de­ci­sions are “im­por­tant” in this sense, giv­ing an­other 1000x rel­a­tive weight­ing.

After you con­di­tion on a de­ci­sion be­ing im­por­tant and us­ing a par­tic­u­lar ver­sion of the Solomonoff prior, it thus seems quite likely that a non-triv­ial frac­tion of your prior is be­ing con­trol­led by con­se­quen­tial­ists.

An in­tu­ition pump is that this ar­gu­ment is closer to an ex­is­tence claim than a for-all claim. The Solomonoff prior is ma­lign if there ex­ists a sim­ple uni­verse of con­se­quen­tial­ists that wants to in­fluence our uni­verse. This uni­verse need not be sim­ple in an ab­solute sense, only sim­ple rel­a­tive to the other TMs that could equal it in pre­dic­tive power. Even if most con­se­quen­tial­ists are too com­pli­cated or not in­ter­ested, it seems likely that there is at least one uni­verse that is.

## Example

Com­plex­ity of Consequentialists

How many bits does it take to spec­ify a uni­verse that can give rise to con­se­quen­tial­ists? I do not know, but it seems like Con­way’s Game of Life might provide a rea­son­able lower bound.

Luck­ily, the code golf com­mu­nity has spent some amount of effort op­ti­miz­ing for pro­gram size. How many bytes would you guess it takes to spec­ify Game of Life? Well, it de­pends on the uni­ver­sal TM. Pos­si­ble an­swers in­clude 6, 32, 39, or 96.

Since uni­verses of con­se­quen­tial­ists can “cheat” by con­cen­trat­ing their pre­dic­tive efforts onto uni­ver­sal TMs in which they are par­tic­u­larly sim­ple, we’ll take the min­i­mum. Ad­di­tion­ally, my friend who’s into code golf (he wrote the 96-byte solu­tion!) says that the 6-byte an­swer ac­tu­ally con­tains closer to 4 bytes of in­for­ma­tion.

To spec­ify an ini­tial con­figu­ra­tion that can give rise to con­se­quen­tial­ists we will need to provide more in­for­ma­tion. The small­est in­finite growth pat­tern in Game of Life has been shown to need 10 cells. Another refer­ence point is that a self-repli­ca­tor with 12 cells ex­ists in HighLife, a Game of Life var­i­ant. I’m not an ex­pert, but I think an ini­tial con­figu­ra­tion that gives rise to in­tel­li­gent life can be speci­fied in an 8x8 bound­ing box, giv­ing a to­tal of 8 bytes.

Fi­nally, we need to spec­ify a sam­pling pro­ce­dure that con­se­quen­tial­ists can gain con­trol of. Some­thing like “read <cell> ev­ery <large num­ber> time ticks” suffices. By as­sump­tion, the cell be­ing sam­pled takes al­most no in­for­ma­tion to spec­ify. We can also choose what­ever large num­ber is eas­iest to spec­ify (the busy beaver num­bers come to mind). In to­tal, I don’t think this will take more than 2 bytes.

Sum­ming up, Game of Life + ini­tial con­figu­ra­tion + sam­pling method takes maybe 16 bytes, so a rea­son­able range for the com­plex­ity of a uni­verse of con­se­quen­tial­ists might be 10-1000 bytes. That doesn’t seem like very many, es­pe­cially rel­a­tive to the amount of in­for­ma­tion we’ll be con­di­tion­ing the Solomonoff prior on if we ever use it to make an im­por­tant de­ci­sion.

Com­plex­ity of Conditioning

When we’re us­ing the Solomonoff prior to make an im­por­tant de­ci­sion, the ob­ser­va­tions we’ll con­di­tion on in­clude in­for­ma­tion that:

1. We’re us­ing the Solomonoff prior

2. We’re mak­ing an im­por­tant decision

3. We’re us­ing some par­tic­u­lar uni­ver­sal TM

How much in­for­ma­tion will this in­clude? Many pro­grams will not simu­late uni­verses. Many uni­verses ex­ist that do not have ob­servers. Among uni­verses with ob­servers, some will not de­velop the Solomonoff prior. Th­ese ob­servers will make many de­ci­sions. Very few of these de­ci­sions will be im­por­tant. Even fewer of these de­ci­sions are made with the Solomonoff prior. Even fewer will use the par­tic­u­lar ver­sion of the Solomonoff prior that gets used.

It seems rea­son­able to say that this is at least a megabyte of raw in­for­ma­tion, or about a mil­lion bytes. (I ac­knowl­edge some cart-horse is­sues here.)

This means that af­ter you con­di­tion your Solomonoff prior, you’ll be left with pro­grams that are at least a mil­lion bytes. As our Game of Life ex­am­ple shows, it only takes maybe 10-1000 of these bytes to spec­ify a uni­verse that gives rise to con­se­quen­tial­ists. You have ap­prox­i­mately a mil­lion bytes left to spec­ify more prop­er­ties of the uni­verse that will make it more likely the con­se­quen­tial­ists will want to ex­ert in­fluence over the Solomonoff prior for the pur­pose of in­fluenc­ing this par­tic­u­lar de­ci­sion.

# Why might this ar­gu­ment be wrong?

## Inac­cessible Channels

Argument

Most of the uni­verse is out­side of hu­man­ity’s light-cone. This might sug­gest that most “sim­ple” ways to sam­ple from our uni­verse are cur­rently out­side our in­fluence, mean­ing that the only por­tions of the Solomonoff prior we can con­trol are go­ing to have an ex­tremely low weight.

In gen­eral, it might be the case that for any uni­verse, con­se­quen­tial­ists in­side that uni­verse are go­ing to have difficulty con­trol­ling sim­ple out­put chan­nels. For ex­am­ple, in Game of Life, a sim­ple way to read in­for­ma­tion might sam­ple a cell par­tic­u­lar cell start­ing at t=0. How­ever, con­se­quen­tial­ists in Game of Life will not ap­pear un­til a much later time and will be un­able to con­trol a large ini­tial chunk of that out­put chan­nel.

Counter-argument

Paul Chris­ti­ano points out that the gen­eral form of this ar­gu­ment also ap­plies to other TMs that com­pose of your Solomonoff prior. For ex­am­ple, when pre­dict­ing what I’ll type next, you would “want” to simu­late me and pre­dict what I would type start­ing at some time T. How­ever, this is a pretty com­pli­cated way of sam­pling. The fact that sim­ple sam­pling pro­ce­dures are less pre­dic­tive doesn’t asym­met­ri­cally pe­nal­ize con­se­quen­tial­ists. The con­se­quen­tial­ists uni­verse and sam­pling method only have to be sim­ple rel­a­tive to other pro­grams that are equally good at pre­dict­ing.

One might also note that large num­bers can be pro­duced with rel­a­tively few bits, so “sam­ple start­ing at <large num­ber>” is not much more com­pli­cated than “sam­ple start­ing at 0″.

## Speedy Channels

Argument

There are many sim­ple ways of sam­pling from uni­verses very quickly. For ex­am­ple, in Game of Life, one can sam­ple a cell ev­ery time-tick. It seems fea­si­ble for con­se­quen­tial­ists to simu­late Earth in the Game of Life, but not fea­si­ble to simu­late Earth such that they can al­ter a spe­cific cell ev­ery time tick per the simu­la­tion.

Counter-argument

Con­se­quen­tial­ists in the Game of Life could sim­ply simu­late Earth, com­pute the pre­dic­tions, then later broad­cast them along very fast sam­pling chan­nels. How­ever, it might be the case that build­ing a ma­chine that al­ters a cell ar­bi­trar­ily ev­ery time tick is im­pos­si­ble. In our uni­verse, there might be sam­ple pro­ce­dures that physics does not per­mit us to ex­ert ar­bi­trary con­trol over, e.g. due to speed of light limi­ta­tions. If this is the case, con­se­quen­tial­ists will di­rect efforts to­wards the sim­plest chan­nel they can con­trol.

## Com­pu­ta­tional Burden

Argument

Deter­min­ing how to prop­erly in­fluence the Solomonoff prior re­quires mas­sive com­pu­ta­tion re­sources de­voted to simu­lat­ing other uni­verses and how they’re go­ing to use the Solomonoff prior. While the Solomonoff prior does not pe­nal­ize ex­tremely long run-times, from the per­spec­tive of the con­se­quen­tial­ists do­ing the simu­lat­ing, run-times will mat­ter. In par­tic­u­lar, con­se­quen­tial­ists will likely be able to use com­pute to achieve things they value (like we are ca­pa­ble of do­ing). There­fore, it would be ex­tremely costly to ex­ert in­fluence over the Solomonoff prior, po­ten­tially to the point where con­se­quen­tial­ists will choose not to do so.

Counter-argument

The com­pu­ta­tional bur­den of pre­dict­ing the use of the Solomonoff in other uni­verses is an em­piri­cal ques­tion. Since it’s a rel­a­tively fixed cost and there are many other uni­verses, con­se­quen­tial­ists might rea­son that the marginal in­fluence over these other uni­verses is worth the com­pute. Is­sues might arise if the use of the Solomonoff prior in other uni­verses is very sen­si­tive to pre­cise his­tor­i­cal data, which would re­quire a very pre­cise simu­la­tion to in­fluence, in­creas­ing the com­pu­ta­tional bur­den.

Ad­di­tion­ally, some uni­verses will find them­selves with more com­put­ing power than other uni­verses. Uni­verses with a lot of com­put­ing power might find it rel­a­tively easy to pre­dict the use of the Solomonoff prior in sim­pler uni­verses and sub­se­quently ex­ert in­fluence over them.

## Mal­ign im­plies complex

Argument

A pre­dic­tor that cor­rectly pre­dicts the first N bits of a se­quence then switches to be­ing ma­lign will be strictly more com­pli­cated than a pre­dic­tor that doesn’t switch to be­ing ma­lign. There­fore, while con­se­quen­tial­ists in other uni­verses might have some in­fluence over the Solomonoff prior, they will be dom­i­nated by non-ma­lign pre­dic­tors.

Counter-argument

This ar­gu­ment makes a mis­taken as­sump­tion that the ma­lign in­fluence on the Solomonoff prior is in the form of pro­grams that have their “ma­lign­ness” as part of the pro­gram. The ar­gu­ment given sug­gests that simu­lated con­se­quen­tial­ists will have an in­stru­men­tal rea­son to be pow­er­ful pre­dic­tors. Th­ese simu­lated con­se­quen­tial­ists have rea­soned about the Solomonoff prior and are ex­e­cut­ing the strat­egy of “be good at pre­dict­ing, then ex­ert ma­lign in­fluence”, but this strat­egy is not hard­coded so ex­ert­ing ma­lign in­fluence does not add com­plex­ity.

## Cancel­ing Influence

Argument

If it’s true that many con­se­quen­tial­ists are try­ing to in­fluence the Solomonoff prior, then one might ex­pect the in­fluence to can­cel out. It’s im­prob­a­ble that all the con­se­quen­tial­ists have the same prefer­ences; on av­er­age, there should be an equal num­ber of con­se­quen­tial­ists try­ing to in­fluence any given de­ci­sion in any given di­rec­tion. Since the con­se­quen­tial­ists them­selves can rea­son thus, they will re­al­ize that the ex­pected amount of in­fluence is ex­tremely low, so they will not at­tempt to ex­ert in­fluence at all. Even if some of the con­se­quen­tial­ists try to ex­ert in­fluence any­way, we should ex­pect the in­fluence of these con­se­quen­tial­ists to can­cel out also.

Counter-argument

Since the weight of a civ­i­liza­tion of con­se­quen­tial­ists in the Solomonoff prior is pe­nal­ized ex­po­nen­tially with re­spect to com­plex­ity, it might be the case that for any given ver­sion of the Solomonoff prior, most of the in­fluence is dom­i­nated by one sim­ple uni­verse. Differ­ent val­ues of con­se­quen­tial­ists im­ply that they care about differ­ent de­ci­sions, so for any given de­ci­sion, it might be that very few uni­verses of con­se­quen­tial­ists are both sim­ple enough that they have enough in­fluence and care about that de­ci­sion.

Even if for any given de­ci­sion, there are always 100 uni­verses with equal in­fluence and differ­ing prefer­ences, there are strate­gies that they might use to ex­ert in­fluence any­way. One sim­ple strat­egy is for each uni­verse to ex­ert in­fluence with a 1% chance, giv­ing ev­ery uni­verse 1100 of the re­sources in ex­pec­ta­tion. If the re­sources ac­cessible are vast enough, then this might be a good deal for the con­se­quen­tial­ists. Con­se­quen­tial­ists would not defect against each other for the rea­sons that mo­ti­vate func­tional de­ci­sion the­ory.

More ex­otic solu­tions to this co­or­di­na­tion prob­lem in­clude acausal trade amongst uni­verses of differ­ent con­se­quen­tial­ists to form col­lec­tives that ex­ert in­fluence in a par­tic­u­lar di­rec­tion.

Be warned that this leads to much weird­ness.

# Conclusion

The Solomonoff prior is very strange. Agents that make de­ci­sions us­ing the Solomonoff prior are likely to be sub­ject to in­fluence from con­se­quen­tial­ists in simu­lated uni­verses. Since it is difficult to com­pute the Solomonoff prior, this fact might not be rele­vant in the real world.

How­ever, Paul Chris­ti­ano ap­plies roughly the same ar­gu­ment to claim that the im­plicit prior used in neu­ral net­works is also likely to gen­er­al­ize catas­troph­i­cally. (See Learn­ing the prior for a po­ten­tial way to tackle this prob­lem).

Warn­ing: highly ex­per­i­men­tal in­ter­est­ing spec­u­la­tion.

## Un­im­por­tant Decisions

Con­se­quen­tial­ists have a clear mo­tive to ex­ert in­fluence over im­por­tant de­ci­sions. What about unim­por­tant de­ci­sions?

The gen­eral form of the above ar­gu­ment says: “for any given pre­dic­tion task, the pro­grams that do best are dis­pro­por­tionately likely to be con­se­quen­tial­ists that want to do well at the task”. For im­por­tant de­ci­sions, many con­se­quen­tial­ists would in­stru­men­tally want to do well at the task. How­ever, for unim­por­tant de­ci­sions, there might be con­se­quen­tial­ists that want to make good pre­dic­tions. Th­ese con­se­quen­tial­ists would still be able to con­cen­trate efforts on ver­sions of the Solomonoff prior that weighted them es­pe­cially high, so they might out­perform other pro­grams in the long run.

It’s un­clear to me whether or not this be­hav­ior would be ma­lign. One rea­son why it might be ma­lign is that these con­se­quen­tial­ists that care about pre­dic­tions would want to make our uni­verse more pre­dictable. How­ever, while I am rel­a­tively con­fi­dent that ar­gu­ments about in­stru­men­tal con­ver­gence should hold, spec­u­lat­ing about pos­si­ble prefer­ences of simu­lated con­se­quen­tial­ists seems likely to pro­duce er­rors in rea­son­ing.

## Hail mary

Paul Chris­ti­ano sug­gests that hu­man­ity was des­per­ate enough to want to throw a “hail mary”, one way to do this is to use the Solomonoff prior to con­struct a util­ity func­tion that will con­trol the en­tire fu­ture. Since this is a very im­por­tant de­ci­sion, we ex­pect con­se­quen­tial­ists in the Solomonoff prior to care about in­fluenc­ing this de­ci­sion. There­fore, the re­sult­ing util­ity func­tion is likely to rep­re­sent some simu­lated uni­verse.

If ar­gu­ments about acausal trade and value hand­shakes hold, then the re­sult­ing util­ity func­tion might con­tain some frac­tion of hu­man val­ues. Again, this leads to much weird­ness in many ways.

## Speed prior

One rea­son that the Solomonoff prior con­tains simu­lated con­se­quen­tial­ists is that its no­tion of com­plex­ity does not pe­nal­ize run­time com­plex­ity, so very sim­ple pro­grams are al­lowed to perform mas­sive amounts of com­pu­ta­tion. The speed prior at­tempts to re­solve this is­sue by pe­nal­iz­ing pro­grams by an ad­di­tional log­a­r­ithm of the amount of time for which it’s run.

The speed prior might re­duce the rel­a­tive weight­ing of uni­verses with con­se­quen­tial­ists be­cause such pro­grams have to be run for a very long time be­fore they start pro­duc­ing rea­son­able pre­dic­tions. The con­se­quen­tial­ists have to gain con­trol of their uni­verse, un­der­stand their fun­da­men­tal laws of physics, simu­late other uni­verses, then ma­nipu­late the speed prior. This might all take a very long time, caus­ing con­se­quen­tial­ists to be dom­i­nated by other pro­grams.

In gen­eral, pe­nal­iz­ing slow­ness might cause pro­grams to “waste” less time on simu­lat­ing con­se­quen­tial­ists, de­vot­ing more com­pu­ta­tion to­wards perform­ing the pre­dic­tion task. How­ever, it still might be the case that a uni­verse that has very fast con­se­quen­tial­ists might still end up dom­i­nat­ing.

Evan Hub­inger also pro­vides an ar­gu­ment that even very fast pro­grams are pos­si­bly ma­lign. The key in­sight is that even though your pro­gram needs to be fast, it can be run­ning pro­grams that are them­selves less fast. For ex­am­ple, one pos­si­ble fast way to solve a clas­sifi­ca­tion prob­lem is to search to find a neu­ral net­work, then use that net­work for in­fer­ence. How­ever, if you wanted your search to find a fast neu­ral net­work, then the search it­self might take longer, re­sult­ing in a net in­crease in speed. Thus, time “waste” can man­i­fest in pro­grams that were ex­plic­itly op­ti­mized to not “waste” time. This “wasted” time could po­ten­tially be used for ma­lign op­ti­miza­tion, in this case for gain­ing in­fluence over the speed prior.

## Ran­dom­ized prior

A po­ten­tial way to re­duce the in­fluence con­se­quen­tial­ists have on a de­ci­sion made by the Solomonoff prior is to ran­dom­ize the par­tic­u­lar ver­sion of the prior that gets used. For ex­am­ple, we might make the par­tic­u­lar uni­ver­sal TM we use de­pen­dent on very pre­cise his­tor­i­cal data. Thus, it would be very costly for con­se­quen­tial­ists to simu­late this his­tory, and thus costly to pre­dict which form of the Solomonoff prior we used.

If con­se­quen­tial­ists can’t pre­dict which Solomonoff prior we are go­ing to use, no par­tic­u­lar civ­i­liza­tion of con­se­quen­tial­ists will have the large ad­van­tage con­ferred by the an­thropic up­date. There­fore, one might hope that all civ­i­liza­tions of con­se­quen­tial­ists will not care about that par­tic­u­lar de­ci­sion.

This ar­gu­ment makes a cou­ple of as­sump­tions. First, it as­sumes that simu­lat­ing very pre­cise his­to­ries is difficult; it might not be difficult for all uni­verses. Se­cond, it as­sumes that the uni­verses through which in­fluence is spread can­not co­or­di­nate, which might be pos­si­ble for through acausal means.

## Sym­me­try considerations

The way that hu­man­ity rea­sons is ev­i­dence for the way that con­se­quen­tial­ists in other uni­verses will rea­son. If hu­man­ity rea­sons that the Solomonoff prior is ma­lign and there­fore is un­will­ing to use it to make de­ci­sions, then con­se­quen­tial­ists in other uni­verses might do like­wise. Th­ese uni­verses would not use the Solomonoff prior to make de­ci­sions.

The re­sult­ing state is that ev­ery­one is wor­ried about the Solomonoff prior be­ing ma­lign, so no one uses it. This means that no uni­verse will want to use re­sources try­ing to in­fluence the Solomonoff prior; they aren’t in­fluenc­ing any­thing.

This sym­me­try ob­vi­ously breaks if there are uni­verses that do not re­al­ize that the Solomonoff prior is ma­lign or can­not co­or­di­nate to avoid its use. One pos­si­ble way this might hap­pen is if a uni­verse had ac­cess to ex­tremely large amounts of com­pute (from the sub­jec­tive ex­pe­rience of the con­se­quen­tial­ists). In this uni­verse, the mo­ment some­one dis­cov­ered the Solomonoff prior, it might be fea­si­ble to start mak­ing de­ci­sions based on a close ap­prox­i­ma­tion.

## Recursion

Uni­verses that use the Solomonoff prior to make im­por­tant de­ci­sions might be taken over by con­se­quen­tial­ists in other uni­verses. A nat­u­ral thing for these con­se­quen­tial­ists to do is to use their po­si­tion in this new uni­verse to also ex­ert in­fluence on the Solomonoff prior. As con­se­quen­tial­ists take over more uni­verses, they have more uni­verses through which to in­fluence the Solomonoff prior, al­low­ing them to take over more uni­verses.

In the limit, it might be that for any fixed ver­sion of the Solomonoff prior, most of the in­fluence is wielded by the sim­plest con­se­quen­tial­ists ac­cord­ing to that prior. How­ever, since com­plex­ity is pe­nal­ized ex­po­nen­tially, gain­ing con­trol of ad­di­tional uni­verses does not in­crease your rel­a­tive in­fluence over the prior by that much. I think this cu­mu­la­tive re­cur­sive effect might be quite strong, or might amount to noth­ing.

• “At its core, this is the main ar­gu­ment why the Solomonoff prior is ma­lign: a lot of the pro­grams will con­tain agents with prefer­ences, these agents will seek to in­fluence the Solomonoff prior, and they will be able to do so effec­tively.”

First, this is ir­rele­vant to most ap­pli­ca­tions of the Solomonoff prior. If I’m us­ing it to check the ran­dom­ness of my ran­dom num­ber gen­er­a­tor, I’m go­ing to be look­ing at 64-bit strings, and prob­a­bly very few in­tel­li­gent-life-pro­duc­ing uni­verse-simu­la­tors out­put just 64 bits, and it’s hard to imag­ine how an alien in a simu­lated uni­verse would want to bias my RNG any­way.

The S. prior is a gen­eral-pur­pose prior which we can ap­ply to any prob­lem. The out­put string has no mean­ing ex­cept in a par­tic­u­lar ap­pli­ca­tion and rep­re­sen­ta­tion, so it seems sense­less to try to in­fluence the prior for a string when you don’t know how that string will be in­ter­preted.

Can you give an in­stance of an ap­pli­ca­tion of the S. prior in which, if ev­ery­thing you wrote were cor­rect, it would mat­ter?

Se­cond, it isn’t clear that this is a bug rather than a fea­ture. Say I’m de­vel­op­ing a pro­gram to com­press pho­tos. I’d like to be able to ask “what are the odds of see­ing this image, ever, in any uni­verse?” That would prob­a­bly com­press images of plants and an­i­mals bet­ter than other pri­ors, be­cause in lots of uni­verses life will arise and evolve, and fea­tures like ra­dial sym­me­try, bilat­eral sym­me­try, leafs, legs, etc., will arise in many uni­verses. This bi­as­ing of pri­ors by evolu­tion doesn’t seem to me differ­ent than bi­as­ing of pri­ors by in­tel­li­gent agents; evolu­tion is smarter than any agent we know. And I’d like to get bi­as­ing from in­tel­li­gent agents, too; then my photo-com­pres­sor might com­press images of wheels and rec­tilin­ear build­ings bet­ter.

Also in the cat­e­gory of “it’s a fea­ture, not a bug” is that, if you want your val­ues to be right, and there’s a way of learn­ing the val­ues of agents in many pos­si­ble uni­verses, you ought to try to figure out what their val­ues are, and up­date to­wards them. This ar­gu­ment im­plies that you can get that for free by us­ing Solomonoff pri­ors.

(If you don’t think your val­ues can be “right”, but in­stead you just be­lieve that your val­ues morally oblige you to want other peo­ple to have those val­ues, you’re not fol­low­ing your val­ues, you’re fol­low­ing your the­ory about your val­ues, and prob­a­bly read too much LessWrong for your own good.)

Third, what do you mean by “the out­put” of a pro­gram that simu­lates a uni­verse? How are we even sup­posed to no­tice the in­finites­i­mal frac­tion of that uni­verse’s out­put which the aliens are in­fluenc­ing to sub­vert us? Take your ex­am­ple of Life—is the out­put a raster scan of the 2D bit ar­ray left when the uni­verse goes static? In that case, agents have lit­tle con­trol over the ter­mi­nal state of their uni­verse (and also, in the case of Life, the string will be ei­ther al­most en­tirely ze­roes, or al­most en­tirely 1s, and those both already have huge Solomonoff pri­ors). Or is it the con­cate­na­tion of all of the states it goes through, from start to finish? In that case, by the time in­tel­li­gent agents evolve, their uni­verse will have already pro­duced more bits than our uni­verse can ever read.

Are you imag­in­ing that bits are never out­put un­less the ac­ci­den­tally-simu­lated aliens choose to out­put a bit? I can’t imag­ine any way that could hap­pen, at least not if the uni­verse is speci­fied with a short in­struc­tion string.

This brings us to the 4th prob­lem: It makes lit­tle sense to me to worry about av­er­ag­ing in out­puts from even mere plane­tary simu­la­tions if your com­puter is just the size of a planet, be­cause it won’t even have enough mem­ory to read in a sin­gle out­put string from most such simu­la­tions.

5th, you can weigh each pro­gram’s out­put pro­por­tional to 2^-T, where T is the num­ber of steps it takes the TM to ter­mi­nate. You’ve got to do some­thing like that any­way, be­cause you can’t run TMs to com­ple­tion one af­ter an­other; you’ve got to do some­thing like take a large ran­dom sam­ple of TMs and iter­a­tively run each one step. Prob­lem solved.

Maybe I’m mi­s­un­der­stand­ing some­thing ba­sic, but I feel like we’re talk­ing about many an­gels can dance on the head of a pin.

Per­haps the biggest prob­lem is that you’re talk­ing about an en­tire uni­verse of in­tel­li­gent agents con­spiring to change the “out­put string” of the TM that they’re run­ning in. This re­quires them to re­al­ize that they’re run­ning in a simu­la­tion, and that the out­put string they’re try­ing to in­fluence won’t even be looked at un­til they’re all dead and gone. That doesn’t seem to give them much mo­ti­va­tion to de­vote their en­tire civ­i­liza­tion to twid­dling bits in their uni­verse’s fi­nal out­put in or­der to shift our pri­ors in­finites­i­mally. And if it did, the more likely out­come would be an in­ter­galac­tic war over what string to out­put.

(I un­der­stand your point about them try­ing to “write them­selves into ex­is­tence, al­low­ing them to effec­tively “break into” our uni­verse”, but as you’ve already re­quired their TM speci­fi­ca­tion to be very sim­ple, this means the most they can do is cause some type of life that might evolve in their uni­verse to break into our uni­verse. This would be like hu­mans on Earth de­vot­ing the next billion years to trick­ing God into re-cre­at­ing slime molds af­ter we’re dead. Whereas the things about them­selves that in­tel­li­gent life ac­tu­ally care about with and self-iden­tify with are those things that dis­t­in­guish them from their neigh­bors. Their val­ues will be di­rected mainly to­wards op­pos­ing the val­ues of other mem­bers of their species. None of those dis­t­in­guish­ing traits can be im­plicit in the TM, and even if they could, they’d can­cel each other out.)

Now, if they were able to en­code a mes­sage to us in their out­put string, that might be more satis­fy­ing to them. Like, maybe, “FUCK YOU, GOD!”

• The S. prior is a gen­eral-pur­pose prior which we can ap­ply to any prob­lem. The out­put string has no mean­ing ex­cept in a par­tic­u­lar ap­pli­ca­tion and rep­re­sen­ta­tion, so it seems sense­less to try to in­fluence the prior for a string when you don’t know how that string will be in­ter­preted.

The claim is that con­se­quen­tal­ists in simu­lated uni­verses will model de­ci­sions based on the Solomonoff prior, so they will know how that string will be in­ter­preted.

Can you give an in­stance of an ap­pli­ca­tion of the S. prior in which, if ev­ery­thing you wrote were cor­rect, it would mat­ter?

Any de­ci­sion that con­trols sub­stan­tial re­source al­lo­ca­tion will do. For ex­am­ple, if we’re evalut­ing the im­pact of run­ning var­i­ous pro­grams, blow up planets, in­terfere will alien life, etc.

Also in the cat­e­gory of “it’s a fea­ture, not a bug” is that, if you want your val­ues to be right, and there’s a way of learn­ing the val­ues of agents in many pos­si­ble uni­verses, you ought to try to figure out what their val­ues are, and up­date to­wards them. This ar­gu­ment im­plies that you can get that for free by us­ing Solomonoff pri­ors.

If you are a moral re­al­ist, this does seem like a pos­si­ble fea­ture of the Solomonoff prior.

Third, what do you mean by “the out­put” of a pro­gram that simu­lates a uni­verse?

A TM that simu­lates a uni­verse must also spec­ify an out­put chan­nel.

Take your ex­am­ple of Life—is the out­put a raster scan of the 2D bit ar­ray left when the uni­verse goes static? In that case, agents have lit­tle con­trol over the ter­mi­nal state of their uni­verse (and also, in the case of Life, the string will be ei­ther al­most en­tirely ze­roes, or al­most en­tirely 1s, and those both already have huge Solomonoff pri­ors). Or is it the con­cate­na­tion of all of the states it goes through, from start to finish?

All of the above. We are run­ning all pos­si­ble TMs, so all com­putable uni­verses will be paired will all com­putable out­put chan­nels. It’s just a ques­tion of com­plex­ity.

Are you imag­in­ing that bits are never out­put un­less the ac­ci­den­tally-simu­lated aliens choose to out­put a bit? I can’t imag­ine any way that could hap­pen, at least not if the uni­verse is speci­fied with a short in­struc­tion string.

No.

This brings us to the 4th prob­lem: It makes lit­tle sense to me to worry about av­er­ag­ing in out­puts from even mere plane­tary simu­la­tions if your com­puter is just the size of a planet, be­cause it won’t even have enough mem­ory to read in a sin­gle out­put string from most such simu­la­tions.

I agree that ap­prox­i­ma­tion the Sol­monoff prior is difficult and thus its ma­lig­nancy prob­a­bly doesn’t mat­ter in prac­tice. I do think similar ar­gu­ments ap­ply to cases that do mat­ter.

5th, you can weigh each pro­gram’s out­put pro­por­tional to 2^-T, where T is the num­ber of steps it takes the TM to ter­mi­nate. You’ve got to do some­thing like that any­way, be­cause you can’t run TMs to com­ple­tion one af­ter an­other; you’ve got to do some­thing like take a large ran­dom sam­ple of TMs and iter­a­tively run each one step. Prob­lem solved.

See the sec­tion on the Speed prior.

Per­haps the biggest prob­lem is that you’re talk­ing about an en­tire uni­verse of in­tel­li­gent agents con­spiring to change the “out­put string” of the TM that they’re run­ning in. This re­quires them to re­al­ize that they’re run­ning in a simu­la­tion, and that the out­put string they’re try­ing to in­fluence won’t even be looked at un­til they’re all dead and gone. That doesn’t seem to give them much mo­ti­va­tion to de­vote their en­tire civ­i­liza­tion to twid­dling bits in their uni­verse’s fi­nal out­put in or­der to shift our pri­ors in­finites­i­mally. And if it did, the more likely out­come would be an in­ter­galac­tic war over what string to out­put.

They don’t have to re­al­ize they’re in a simu­la­tion, they just have to re­al­ize their uni­verse is com­putable. Con­se­quen­tial­ists care about their val­ues af­ter they’re dead. The cost of in­flunc­ing the prior might not be that high be­cause they only have to com­pute it once and the benefit might be enor­mous. Ex­po­nen­tial de­cay + acausal trade make an in­ter­galac­tic war un­likely.

• Cu­rated. This post does a good job of sum­ma­riz­ing a lot of com­plex ma­te­rial, in a (mod­er­ately) ac­cessible fash­ion.

• This is great. I re­ally ap­pre­ci­ate when peo­ple try to sum­ma­rize com­plex ar­gu­ments that are spread across mul­ti­ple posts.

Also, I ba­si­cally do this (try to in­fer the right prior). My guid­ing nav­i­ga­tion is try­ing to figure out what (I call) the su­per co­op­er­a­tion cluster would do then do that.

• I liked this post a lot, but I did read it as some­thing of a scifi short story with a McGuffin called “The Solomonoff Prior”.

It prob­a­bly also seemed re­ally weird be­cause I just read Why Philoso­phers Should Care About Com­pu­ta­tional Com­plex­ity [PDF] by Scott Aaron­son and hav­ing read it makes sen­tences like this seem ‘not even’ in­sane:

The com­bined strat­egy is thus to take a dis­tri­bu­tion over all de­ci­sions in­formed by the Solomonoff prior, weight them by how much in­fluence can be gained and the ver­sion of the prior be­ing used, and read off a se­quence of bits that will cause some of these de­ci­sions to re­sult in a preferred out­come.

The Con­se­quen­tial­ists are of course the most badass (by con­struc­tion) alien villains ever “try­ing to in­fluence the Solomonoff prior” as they are wont!

Given that some very smart peo­ple seem to se­ri­ously be­lieve in Pla­tonic re­al­ism, maybe there are Con­se­quen­tial­ists ma­lignly in­fluenc­ing vast in­fini­ties of uni­verses! Maybe our uni­verse is one of them.

I’m not sure why, but I feel like the dis­cov­ery of a proof of P = NP or P ≠ NP is the cli­max of the heroes vali­ant strug­gle, as the true heirs of the di­v­ine right to wield The Solomonoff Prior, against the dreaded (other uni­verse) Con­se­quen­tial­ists.

• If it’s true that simu­lat­ing that uni­verse is the sim­plest way to pre­dict our hu­man, then some non-triv­ial frac­tion of our pre­dic­tion might be con­trol­led by a simu­la­tion in an­other uni­verse. If these be­ings want us to act in cer­tain ways, they have an in­cen­tive to al­ter their simu­la­tion to change our pre­dic­tions.

I find this con­fus­ing. I’m not say­ing it’s wrong, nec­es­sar­ily, but it at least feels to me like there’s a step of the ar­gu­ment that’s be­ing skipped.

To me, it seems like there’s a ba­sic di­chotomy be­tween pre­dict­ing and con­trol­ling. And this is claiming that some­how an agent some­where is do­ing both. (Or ac­tu­ally, con­trol­ling by pre­dict­ing!) But how, ex­actly?

Is it that:

• these other agents are pre­dict­ing us, by simu­lat­ing us, and so we should think of our­selves as par­tially ex­ist­ing in their uni­verse? (with them as our godlike over­lords who can con­tinue the simu­la­tion from the cur­rent point as they wish)

• the Con­se­quen­tial­ists will pre­dict ac­cu­rately for a while, and then make a clas­sic “treach­er­ous turn” where they start slip­ping in wrong pre­dic­tions de­signed to in­fluence us rather than be ac­cu­rate, af­ter hav­ing gained our trust?

• some­thing else?

My guess is that it’s the sec­ond thing (in part from hav­ing read, and very par­tially un­der­stood, Paul’s posts on this a while ago). But then I would ex­pect some dis­cus­sion of the “treach­er­ous turn” as­pect of it—of the fact that they have to pre­dict ac­cu­rately for a while (so that we rate them highly in our en­sem­ble of pro­grams), and only then can they start out­putting pre­dic­tions that ma­nipu­late us.

Is that not the case? Have I mi­s­un­der­stood some­thing?

(Btw, I found the stuff about python^10 and exec() pretty clear. I liked those ex­am­ples. Thank you! It was just from this point on in the post that I wasn’t quite sure what to make of it.)

• My un­der­stand­ing is the first thing is what you get with UDASSA and the sec­ond thing would be what you get is if you think the Solomonoff prior is use­ful for pre­dict­ing your uni­verse for some other rea­son (ie not be­cause you think the like­li­hood of find­ing your­self in some situ­a­tion co­varies with the Solomonoff prior’s weight on that situ­a­tion)

• At its core, this is the main ar­gu­ment why the Solomonoff prior is ma­lign: a lot of the pro­grams will con­tain agents with prefer­ences, these agents will seek to in­fluence the Solomonoff prior, and they will be able to do so effec­tively.

Am I the only one who sees this much less as a state­ment that the Solomonoff prior is ma­lign, and much more a state­ment that re­al­ity it­self is ma­lign? I think that the proper re­ac­tion is not to use a differ­ent prior, but to build agents that are ro­bust to the pos­si­bil­ity that we live in a simu­la­tion run by in­fluence seek­ing ma­lign agents so that they don’t end up like this.

• It seems to me that us­ing a com­bi­na­tion of ex­e­cu­tion time, mem­ory use and pro­gram length mostly kills this set of ar­gu­ments.

Some­thing like a game-of-life ini­tial con­figu­ra­tion that leads to the even­tual evolu­tion of in­tel­li­gent game-of-life aliens who then strate­gi­cally feed out­puts into GoL in or­der to ma­nipu­late you may have very good com­plex­ity perfor­mance, but both the speed and mem­ory are go­ing to be pretty awful. The fixed cost in mem­ory and ex­e­cu­tion steps of es­sen­tially simu­lat­ing an en­tire uni­verse is huge.

But yes, the pure com­plex­ity prior cer­tainly has some per­verse and un­set­tling prop­er­ties.

EDIT: This is re­ally a spe­cial case of Mesa-Op­ti­miz­ers be­ing dan­ger­ous. (See, e.g. https://​​www.less­wrong.com/​​posts/​​XWPJfgBymBbL3jdFd/​​an-58-mesa-op­ti­miza­tion-what-it-is-and-why-we-should-care). The set of dan­ger­ous Mesa-Op­ti­miz­ers is ob­vi­ously big­ger than just “simu­lated aliens” and even time- and space-effi­cient al­gorithms might run into them.

• Com­plex­ity in­deed mat­ters: the uni­verse seems to be bounded in both time and space, so run­ning any­thing like Solomonoff prior al­gorithm (in one of its var­i­ants) or AIXI may be out­right im­pos­si­ble for any non-triv­ial model. This for me sig­nifi­cantly weak­ens or changes some of the im­pli­ca­tions.

A Fermi up­per bound of the di­rect Solomonoff/​AIXI al­gorithm try­ing TMs in the or­der of in­creas­ing com­plex­ity: even if check­ing one TM took one Planck time on one atom, you could only check cca 10^250=2^800 ma­chines within a life­time of the uni­verse (~10^110 years un­til Heat death), so the ma­chines you could even look at have de­scrip­tion com­plex­ity a mea­ger 800 bits.

• You could likely speed the greedy search up, but note that most al­gorith­mic speedups do not have a large effect on the ex­po­nent (even mul­ti­ply­ing the ex­po­nent with con­stants is not very helpful).

• Sig­nifi­cantly nar­row­ing down the space of TMs to a nar­row sub­class may help, but then we need to take look at the par­tic­u­lar (small) class of TMs rather than have in­tu­itions about all TMs. (And the class would need to be re­ally nar­row—see be­low).

• Due to the Church-Tur­ing the­sis, any limit­ing the scope of the search is likely not very effec­tive, as you can em­bed ar­bi­trary pro­grams (and thus ar­bi­trary com­plex­ity) in any­thing that is strong enough to be a TM in­ter­preter (which the uni­verse is in mul­ti­ple ways).

• It may be hy­po­thet­i­cally pos­si­ble to search for the “right” TMS with­out ex­am­in­ing them in­di­vi­d­u­ally (witch some fu­ture tech, e.g. how sci-fi imag­ined quan­tum com­put­ing), but if such speedup is pos­si­ble, any TMs mod­el­ling the uni­verse would need to be able to con­tain this. This would in­crease any eval­u­a­tion com­plex­ity of the TMs, mak­ing them more sig­nifi­cantly costly than the Planck time I as­sumed above (would need a finer Fermi es­ti­mate with more com­plex as­sump­tions?).

• I am not so con­vinced that pe­nal­iz­ing more stuff will make these ar­gu­ments weak enough that we don’t have to worry about them. For an ex­am­ple of why I think this, see Are min­i­mal cir­cuits de­cep­tive?. Also, adding ex­e­cu­tion/​mem­ory con­straints pe­nal­izes all hy­poth­e­sis and I don’t think uni­verses with con­se­quen­tial­ists are asym­met­ri­cally pe­nal­ized.

• adding ex­e­cu­tion/​mem­ory con­straints pe­nal­izes all hypothesis

In re­al­ity these con­straints do ex­ist, so the ques­tion of “what hap­pens if you don’t care about effi­ciency at all?” is re­ally not im­por­tant. In prac­tice, effi­ciency is ab­solutely crit­i­cal and ev­ery­thing that hap­pens in AI is dom­i­nated by effi­ciency con­sid­er­a­tions.

I think that mesa-op­ti­miza­tion will be a prob­lem. It prob­a­bly won’t look like aliens liv­ing in the Game of Life though.

It’ll look like an in­ter­nal op­ti­mizer that just “de­cides” that the minds of the hu­mans who cre­ated it are an­other part of the en­vi­ron­ment to be op­ti­mized for its not-cor­rectly-al­igned goal.

• If ar­gu­ments about acausal trade and value hand­shakes hold, then the re­sult­ing util­ity func­tion might con­tain some frac­tion of hu­man val­ues.

I think Paul’s Hail Mary via Solomonoff prior idea is not ob­vi­ously re­lated to acausal trade. (It does not priv­ilege agents that en­gage in acausal trade over ones that don’t.)

• I agree. The sen­tence quoted is a sep­a­rate ob­ser­va­tion.

• Such a great post.

Note that I changed the for­mat­ting of your head­ers a bit, to make some of them just bold text. They still ap­pear in the ToC just fine. Let me know if you’d like me to re­vert it or have any other is­sues.

• Looks bet­ter—thanks!

• Is the link for the 6-byte Code Golf solu­tion cor­rect? It takes me to some­thing that ap­pears to be 32 bytes.

• Nope. Should be fixed now.

• In your sec­tion “com­plex­ity of con­di­tion­ing”, if I am un­der­stand­ing cor­rectly, you com­pare the amount of in­for­ma­tion re­quired to pro­duce con­se­quen­tial­ists with the amount of in­for­ma­tion in the ob­ser­va­tions we are con­di­tion­ing on. This, how­ever, is not ap­ples to or­anges: the con­se­quen­tial­ists are com­pet­ing against the “true” ex­pla­na­tion of the data, the one that speci­fies the uni­verse and where to find the data within it, they are not com­pet­ing against the raw data it­self. In an or­dered uni­verse, the “true” ex­pla­na­tion would be shorter than the raw ob­ser­va­tion data, that’s the whole point of us­ing Solomonoff in­duc­tion af­ter all.

So, there are two ad­van­tages the con­se­quen­tial­ists can ex­ploit to “win” and be the shorter ex­pla­na­tion. This ex­ploita­tion must be enough to over­come those 10-1000 bits. One is that, since the de­ci­sion which is be­ing made is very im­por­tant, they can find the data within the uni­verse with­out adding any fur­ther com­plex­ity. This, to me, seems quite ma­lign, as the “true” ex­pla­na­tion is be­ing pe­nal­ized sim­ply be­cause we can­not read data di­rectly from the pro­gram which pro­duces the uni­verse, not be­cause this uni­verse is com­pli­cated.

The sec­ond pos­si­ble ad­van­tage is that these con­se­quen­tial­ists may value our uni­verse for some in­trin­sic rea­son, such as the life in it, so that they pri­ori­tize it over other uni­verses and there­fore it takes less bits to spec­ify their simu­la­tion of it. How­ever, if you could ar­gue that the con­se­quen­tial­ists ac­tu­ally had an ad­van­tage here which out­weighed their own com­plex­ity, this would just sound to me like an ar­gu­ment that we are liv­ing in a simu­la­tion, be­cause it would es­sen­tially be say­ing that our uni­verse is un­duly tuned to be valuable for con­se­quen­tial­ists, to such a de­gree that the ex­is­tence of these con­se­quen­tial­ists is less of a co­in­ci­dence than it just hap­pen­ing to be that valuable.

• In your sec­tion “com­plex­ity of con­di­tion­ing”, if I am un­der­stand­ing cor­rectly, you com­pare the amount of in­for­ma­tion re­quired to pro­duce con­se­quen­tial­ists with the amount of in­for­ma­tion in the ob­ser­va­tions we are con­di­tion­ing on. This, how­ever, is not ap­ples to or­anges: the con­se­quen­tial­ists are com­pet­ing against the “true” ex­pla­na­tion of the data, the one that speci­fies the uni­verse and where to find the data within it, they are not com­pet­ing against the raw data it­self. In an or­dered uni­verse, the “true” ex­pla­na­tion would be shorter than the raw ob­ser­va­tion data, that’s the whole point of us­ing Solomonoff in­duc­tion af­ter all.

The data we’re con­di­tion­ing on has K-com­plex­ity of one megabyte. Maybe I didn’t make this clear.

So, there are two ad­van­tages the con­se­quen­tial­ists can ex­ploit to “win” and be the shorter ex­pla­na­tion. This ex­ploita­tion must be enough to over­come those 10-1000 bits. One is that, since the de­ci­sion which is be­ing made is very im­por­tant, they can find the data within the uni­verse with­out adding any fur­ther com­plex­ity. This, to me, seems quite ma­lign, as the “true” ex­pla­na­tion is be­ing pe­nal­ized sim­ply be­cause we can­not read data di­rectly from the pro­gram which pro­duces the uni­verse, not be­cause this uni­verse is com­pli­cated.

I don’t think I agree with this. Think­ing in terms of con­se­quen­tial­ists com­pet­ing against “true” ex­pla­na­tions doesn’t make that much sense to me. It seems similar to mak­ing the exec hello world “com­pete” against the “true” print hello world.

The “com­plex­ity of con­se­quen­tial­ists” sec­tion an­swers the ques­tion of “how long is the exec func­tion?” where the “in­ter­preter” exec calls is a uni­verse filled with con­se­quen­tial­ists.

How­ever, if you could ar­gue that the con­se­quen­tial­ists ac­tu­ally had an ad­van­tage here which out­weighed their own com­plex­ity, this would just sound to me like an ar­gu­ment that we are liv­ing in a simu­la­tion, be­cause it would es­sen­tially be say­ing that our uni­verse is un­duly tuned to be valuable for con­se­quen­tial­ists, to such a de­gree that the ex­is­tence of these con­se­quen­tial­ists is less of a co­in­ci­dence than it just hap­pen­ing to be that valuable.

I do not un­der­stand what this is say­ing. I claim that con­se­quen­tial­ists can rea­son about our uni­verse by think­ing about TMs be­cause our uni­verse is com­putable. Given that our uni­verse sup­ports life, it might thus be valuable to some con­se­quen­tial­ists in other uni­verses. I don’t think the ar­gu­ment takes a stance on whether this uni­verse is a simu­la­tion; it merely claims that this uni­verse could be simu­lated.

• the ini­tial con­di­tions of the uni­verse are sim­pler than the ini­tial con­di­tions of Earth.

This seems to vi­o­late a con­ser­va­tion of in­for­ma­tion prin­ci­ple in quan­tum me­chan­ics.

• per­haps would have been bet­ter worded as “the sim­plest way to spec­ify the ini­tial con­di­tions of Earth is to spec­ify the ini­tial con­di­tions of the uni­verse, the laws of physics, and the lo­ca­tion of Earth.”

• Right, you’re in­ter­ested in syn­tac­tic mea­sures of in­for­ma­tion, more than a phys­i­cal one My bad.

• Wouldn’t com­plex­ity of earth and con­di­tion­ing on im­por­tance be ir­rele­vant be­cause it would still ap­pear in con­se­quen­tial­ists’ dis­tri­bu­tion of strings and in speci­fi­ca­tion of what kind of con­se­quen­tial­ists we want? There­fore they will only have the ad­van­tage of an­thropic up­date, that would go to zero in the limit of string’s length, be­cause choice of the lan­guage would cor­re­late with string’s con­tent, and penalty for their uni­verse + out­put chan­nel.