Self-Fulfilling Prophecies Aren’t Always About Self-Awareness

This is a be­lated fol­low-up to my Dual­ist Pre­dict-O-Matic post, where I share some thoughts re: what could go wrong with the du­al­ist Pre­dict-O-Matic.

Belief in Su­per­pre­dic­tors Could Lead to Self-Fulfilling Prophecies

In my pre­vi­ous post, I de­scribed a Pre­dict-O-Matic which mostly mod­els the world at a fuzzy re­s­olu­tion, and only “zooms in” to model some part of the world in greater re­s­olu­tion if it thinks know­ing the de­tails of that part of the world will im­prove its pre­dic­tion. I con­sid­ered two cases: the case where the Pre­dict-O-Matic sees fit to model it­self in high re­s­olu­tion, and the case where it doesn’t, and just makes use of a fuzzier “out­side view” model of it­self.

What sort of out­side view mod­els of it­self might it use? One pos­si­ble model is: “I’m not sure how this thing works, but its pre­dic­tions always seem to come true!”

If the Pre­dict-O-Matic some­times does fore­cast­ing in non-tem­po­ral or­der, it might first figure out what it thinks will hap­pen, then use that to figure out what it thinks its in­ter­nal fuzzy model of the Pre­dict-O-Matic will pre­dict.

And if it some­times re­vis­its as­pects of its fore­cast to make them con­sis­tent with other as­pects of its fore­cast, it might say: “Hey, if the Pre­dict-O-Matic fore­casts X, that will cause X to no longer hap­pen”. So it figures out what would ac­tu­ally hap­pen if X gets fore­casted. Call that X’. Sup­pose X != X’. Then the new fore­cast has the Pre­dict-O-Matic pre­dict­ing X and then X’ hap­pens. That can’t be right, be­cause out­side view says the Pre­dict-O-Matic’s pre­dic­tions always come true. So we’ll have the Pre­dict-O-Matic pre­dict­ing X’ in the fore­cast in­stead. But wait, if the Pre­dict-O-Matic pre­dicts X’, then X″ will hap­pen. Etc., etc. un­til a fixed point is found.

Some com­menters on my pre­vi­ous post talked about how mak­ing the Pre­dict-O-Matic self-un­aware could be helpful. Note that self-aware­ness doesn’t ac­tu­ally help with this failure mode, if the Pre­dict-O-Matic knows about (or fore­casts the de­vel­op­ment of) any­thing which can be mod­eled us­ing the out­side view “I’m not sure how this thing works, but its pre­dic­tions always seem to come true!” So the prob­lem here is not self-aware­ness. It’s be­lief in su­per­pre­dic­tors, com­bined with a par­tic­u­lar fore­cast­ing al­gorithm: we’re up­dat­ing our be­liefs in a cyclic fash­ion, or hill-climb­ing our story of how the fu­ture will go un­til the story seems plau­si­ble, or some­thing like that.

Be­fore propos­ing a solu­tion, it’s of­ten valuable to deepen your un­der­stand­ing of the prob­lem.

Glitchy Pre­dic­tor Si­mu­la­tion Could Step Towards Fixed Points

Let’s go back to the case where the Pre­dict-O-Matic sees fit to model it­self in high re­s­olu­tion and we get an in­finite re­curse. Ex­actly what’s go­ing to hap­pen in that case?

I ac­tu­ally think the an­swer isn’t quite ob­vi­ous, be­cause al­though the Pre­dict-O-Matic has limited com­pu­ta­tional re­sources, its in­ter­nal model of it­self also has limited com­pu­ta­tional re­sources. And its in­ter­nal model’s in­ter­nal model of it­self has limited com­pu­ta­tional re­sources too. Etc.

Sup­pose Pre­dict-O-Matic is im­ple­mented in a re­ally naive way where it just crashes if it runs out of com­pu­ta­tional re­sources. If the toplevel Pre­dict-O-Matic has ac­cu­rate be­liefs about its available com­pute, then we might see the toplevel Pre­dict-O-Matic crash be­fore any of the simu­lated Pre­dict-O-Mat­ics crash. Si­mu­lat­ing some­thing which has the same amount of com­pute you do can eas­ily use up all your com­pute!

But sup­pose the Pre­dict-O-Matic un­der­es­ti­mates the amount of com­pute it has. Maybe there’s some ev­i­dence in the en­vi­ron­ment which mis­leads it to think that it has less com­pute than it ac­tu­ally does. So it simu­lates a re­stricted-com­pute ver­sion of it­self rea­son­ably well. Maybe that re­stricted-com­pute ver­sion of it­self is mis­lead in the same way, and simu­lates a dou­ble-re­stricted-com­pute ver­sion of it­self.

Maybe this all hap­pens in a way so that the first Pre­dict-O-Matic in the hi­er­ar­chy to crash is near the bot­tom, not the top. What then?

Deep in the hi­er­ar­chy, the Pre­dict-O-Matic simu­lat­ing the crashed Pre­dict-O-Matic makes pre­dic­tions about what hap­pens in the world af­ter the crash.

Then the Pre­dict-O-Matic simu­lat­ing that Pre­dict-O-Matic makes a pre­dic­tion about what hap­pens in a world where the Pre­dict-O-Matic pre­dicts what­ever would hap­pen af­ter a crashed Pre­dict-O-Matic.

Then the Pre­dict-O-Matic simu­lat­ing that Pre­dict-O-Matic makes a pre­dic­tion about what hap­pens in a world where the Pre­dict-O-Matic pre­dicts [what hap­pens in a world where the Pre­dict-O-Matic pre­dicts what­ever would hap­pen af­ter a crashed Pre­dict-O-Matic].

Then the Pre­dict-O-Matic simu­lat­ing that Pre­dict-O-Matic makes a pre­dic­tion about what hap­pens in a world where the Pre­dict-O-Matic pre­dicts [what hap­pens in a world where the Pre­dict-O-Matic pre­dicts [what hap­pens in a world where the Pre­dict-O-Matic pre­dicts what­ever would hap­pen af­ter a crashed Pre­dict-O-Matic]].

Pre­dict­ing world gets us world’, pre­dict­ing world’ gets us world″, pre­dict­ing world″ gets us world‴… Every layer in the hi­er­ar­chy takes us one step closer to a fixed point.

Note that just like the pre­vi­ous sec­tion, this failure mode doesn’t de­pend on self-aware­ness. It just de­pends on be­liev­ing in some­thing which be­lieves it self-simu­lates.

Re­peated Use Could Step Towards Fixed Points

Another way the Pre­dict-O-Matic can step to­wards fixed points is through sim­ple re­peated use. Sup­pose each time af­ter mak­ing a pre­dic­tion, the Pre­dict-O-Matic gets up­dated data about how the world is go­ing. In par­tic­u­lar, the Pre­dict-O-Matic knows the most re­cent pre­dic­tion it made and can fore­cast how hu­mans will re­spond to that. Then when the hu­mans ask it for a new pre­dic­tion, it in­cor­po­rates the fact of its pre­vi­ous pre­dic­tion into its fore­cast and gen­er­ates a new pre­dic­tion. You can imag­ine a sce­nario where the op­er­a­tors keep ask­ing the Pre­dict-O-Matic the same ques­tion over and over again, get­ting a differ­ent an­swer ev­ery time, try­ing to figure out what’s go­ing wrong—un­til fi­nally the Pre­dict-O-Matic be­gins to con­sis­tently give a par­tic­u­lar an­swer—a fixed point it has in­ad­ver­tently dis­cov­ered.

As Abram al­luded to in one of his com­ments, the Pre­dict-O-Matic might even forsee this en­tire pro­cess hap­pen­ing, and im­me­di­ately fore­cast the fixed point cor­re­spond­ing to the end state. Though, if the fore­cast is de­tailed enough, we’ll get to see this en­tire pro­cess hap­pen­ing within the fore­cast, which could al­low us to avoid an un­wanted out­come.

This one doesn’t seem to de­pend on self-aware­ness ei­ther. Con­sider two Pre­dict-O-Mat­ics with no self-knowl­edge what­so­ever (not even the du­al­ist kind I dis­cussed in my pre­vi­ous post). If they’re get­ting in­formed about the pre­dic­tions the other is mak­ing, they could in­ad­ver­tently work to­gether to step to­wards fixed points.

Solutions

An idea which could ad­dress some of these is­sues: Ask the Pre­dict-O-Matic to make pre­dic­tions con­di­tional on us ig­nor­ing its pre­dic­tions and not tak­ing any ac­tion. Per­haps we’d also want to spec­ify that any ex­ist­ing or fu­ture su­per­pre­dic­tors will also be ig­nored in this hy­po­thet­i­cal.

Then if we ac­tu­ally want to do some­thing about the prob­lems the Pre­dict-O-Matic forsees, we can ask it to pre­dict how the world will go con­di­tional on us tak­ing some par­tic­u­lar ac­tion.

Choos­ing bet­ter in­fer­ence al­gorithms could also be helpful.

Prize

Sorry I was slower than planned on writ­ing this fol­low-up and choos­ing a win­ner. I’ve de­cided to give Bun­thut a $110 prize (in­clud­ing $10 in­ter­est for my slow fol­low-up). Thanks ev­ery­one for your in­sights.