Some problems with making induction benign, and approaches to them

The uni­ver­sal prior is ma­lign. I’ll talk about se­quence of prob­lems caus­ing it to be ma­lign and pos­si­ble solu­tions.


(this post came out of con­ver­sa­tions with Scott, Critch, and Paul)

Here’s the ba­sic setup of how an in­duc­tor might be used. At some point hu­mans use the uni­ver­sal prior to make an im­por­tant pre­dic­tion that in­fluences our ac­tions. Say that hu­mans con­struct an in­duc­tor, give it a se­quence of bits from some sen­sor, and ask it to pre­dict the next bits. Those bits are ac­tu­ally (in a way in­ferrable from ) go­ing to be gen­er­ated by a hu­man in a sealed room who thinks for a year. After the hu­man thinks for a year, the bits are fed into the in­duc­tor. This way, the hu­mans can use the in­duc­tor to pre­dict what the hu­man who thinks for a year will say ahead of time. (This prob­a­bly isn’t the best way to use an in­duc­tor but it serves as a good demon­stra­tion).

The an­thropic up­date is a prob­lem for Solomonoff induction

The origi­nal prob­lem Paul talks about in his post is that con­se­quen­tial­ist agents can make an “an­thropic up­date” on the fact that the string is fed into a Solomonoff in­duc­tor with a spe­cific prior, while Solomonoff in­duc­tion can’t make this up­date on its own. In­ter­est­ingly, this prob­lem with Solomonoff in­duc­tion is also a prob­lem with suffi­ciently large con­volu­tional neu­ral net­works (ca­pa­ble of simu­lat­ing e.g. game of life).

Pos­si­bly, this could be solved by in­vent­ing a nat­u­ral­ized var­i­ant of Solomonoff in­duc­tion that makes this an­thropic up­date.

Si­mu­la­tion war­fare is a prob­lem for nat­u­ral­ized induction

Sup­pose alien civ­i­liza­tions de­vote some per­centage of their re­sources to (a) com­ing up with plau­si­ble val­ues for the pre­fix and (b) con­struct­ing nat­u­ral­ized in­duc­tors that are fed the string fol­lowed by bits such that the aliens would be able to take over the hu­mans’ sys­tem if these bits were the pre­dic­tion of the hu­mans’ in­duc­tor. (For ex­am­ple, these bits could be a ma­nipu­la­tive mes­sage).

Un­der this sup­po­si­tion, a large per­centage of nat­u­ral­ized in­duc­tors in the uni­verse are of this form. In par­tic­u­lar, if the aliens are good at guess­ing , then a large per­centage of nat­u­ral­ized in­duc­tors that are fed a string star­ing with the pre­fix will be of this form. So the nat­u­ral­ized in­duc­tor used by the hu­mans will, un­der rea­son­able an­thropic the­o­ries, pre­dict a mes­sage cho­sen by some alien civ­i­liza­tion with high prob­a­bil­ity.

Si­mu­la­tions can be re­placed with zoos

Plau­si­bly, there’s some way to tell that the aliens’ simu­la­tions of hu­mans are “not real”. But what if the “simu­la­tions” are ac­tu­ally real phys­i­cal planets? Since there are many galax­ies, it would not be hard to use some of them to cre­ate “zoos” similar to planets that life nat­u­rally evolves on. The aliens can de­ploy nanobots in the zoos to in­fluence what the hu­man in the box says. This way, on most phys­i­cal Earths, the nat­u­ral­ized in­duc­tor will be fed bits cho­sen by aliens when these bits are used to make an im­por­tant de­ci­sion.

Does im­por­tance-weight­ing res­cue nat­u­ral­ized in­duc­tion?

Maybe we should some­how dis­t­in­guish “nat­u­ral” Earths from “zoo” Earths. One way they are dis­t­in­guished is that an in­di­vi­d­ual “nat­u­ral” Earth will have more in­fluence over the uni­verse than an in­di­vi­d­ual “zoo” Earth. So maybe there is a sense in which we can make an im­por­tance-weighted nat­u­ral­ized in­duc­tor, which as­signs higher weight to nat­u­ral­ized in­duc­tors used to make de­ci­sions that in­fluence a large part of the uni­verse.

De­sign­ing a good im­por­tance met­ric re­quires know­ing which parts of the uni­verse are im­por­tant a pri­ori. For ex­am­ple, if some re­gions of space are more hu­man-hab­it­able than other re­gions, then in­duc­tors near the more-hab­it­able re­gions should be as­signed more weight. Un­for­tu­nately, if the im­por­tance met­ric is “in­cor­rect”, then the aliens may ex­ploit this fact by e.g. giv­ing zoo Earths ac­cess to a re­gion of space that is ac­tu­ally unim­por­tant but which the im­por­tance met­ric thinks is im­por­tant. (At some level, this is kind of like acausal trade: do­ing acausal trade with aliens can be a bad idea with­out know­ing ahead of time what parts of space and other re­sources you value). Get­ting the im­por­tance met­ric right be­fore even be­ing able to do in­duc­tion seems like a pretty hard prob­lem. I’m pes­simistic about this, and so was Paul when I talked to him about this.

Does a bound on com­pu­ta­tional re­quire­ments help?

In­tu­itively, it would be nice if the in­duc­tor just fo­cused on mod­el­ing the lo­cal phe­nomenon of the hu­man in the box, rather than posit­ing aliens. It shouldn’t be nec­es­sary to think about aliens to pre­dict hu­mans well; all the ideal im­por­tance-weight­ing should add up to nor­mal­ity.

Maybe we can use some­thing like the speed prior to do this. Paul dis­cusses this idea here. Roughly, this will pe­nal­ize the “zoo Earth” hy­poth­e­sis on the ba­sis that it posits a large sur­round­ing con­text.

I don’t ex­pect the speed prior it­self to work for this. In the reg­u­lar Solomonoff in­duc­tion case (rather than the nat­u­ral­ized in­duc­tion case), the “zoo Earth” hy­poth­e­sis doesn’t have to ex­plic­itly rep­re­sent the uni­verse; it can in­stead do ab­stract rea­son­ing about the prob­a­bil­ity that Earth is in a zoo. Such ab­stract rea­son­ing can be both com­pact and effi­cient (e.g. it just says “ob­ser­va­tions are pro­duced by run­ning some al­gorithm similar to log­i­cal in­duc­tiors on this physics”). It’s not clear how “ab­stract rea­son­ing” hy­pothe­ses fit into nat­u­ral­ized in­duc­tors but it would be pretty weird if this com­bi­na­tion solved the prob­lem.

Paul points out that neu­ral net­works with­out weight shar­ing are prob­a­bly be­nign; in some sense this is kind of like a se­vere penalty on ad­di­tional com­pu­ta­tion (as each com­pu­ta­tion step in a neu­ral net­work with­out weight shar­ing re­quires set­ting its pa­ram­e­ters to the right value). Maybe it’s use­ful to think about this case more in or­der to see if the same type of ar­gu­ment could ap­ply to var­i­ants of neu­ral net­works that have weight shar­ing.