Self-confirming prophecies, and simplified Oracle designs

I’ve got a pa­per on two Or­a­cle[1] de­signs: the coun­ter­fac­tual Or­a­cle and the low band­width Or­a­cle. In this post, I’ll re­visit these de­signs and sim­plify them, pre­sent­ing them in terms of se­quence pre­dic­tion for an Or­a­cle with self-con­firm­ing pre­dic­tions.

Pre­dict­ing y

The task of the Or­a­cle is sim­ple: at each time , they will out­put a pre­dic­tion , in the range . There will then be a sub­se­quent ob­ser­va­tion . The Or­a­cle aims to min­imise the quadratic loss func­tion .

Be­cause there is a self-con­firm­ing as­pect to it, the is ac­tu­ally a (stochas­tic) func­tion of (though not of or pre­ced­ing ’s). Let be the ran­dom vari­able such that de­scribes the dis­tri­bu­tion of given . So the Or­a­cle wants to min­imise the ex­pec­ta­tion of the quadratic loss:

  • .

What is the in this prob­lem? Well, I’m go­ing to use it to illus­trate many differ­ent Or­a­cle be­havi­ours, so it is given by this rather con­voluted di­a­gram:

.

The red curve is the ex­pec­ta­tion of , as a func­tion of ; it is given by .

Ig­nor­ing, for the mo­ment, the odd be­havi­our around , is a curve that starts be­low the line, climbs above it (and so has a fixed point at ) in piece­wise-lin­ear fash­ion, and then trans­forms into an in­verted parabola that has an­other fixed point at . The ex­act equa­tion of this curve is not im­por­tant[2]. Rele­vant, though, is the fact that the fixed point at is at­trac­tive, while the one at is not.

What of the blue edg­ing? That rep­re­sents the span of the stan­dard de­vi­a­tion around the ex­pec­ta­tion. For any given , the is a nor­mal dis­tri­bu­tion with mean and stan­dard de­vi­a­tion . This is given by:

So the is zero for less than . From there, it jumps up to , for . From that point on­ward, it starts grow­ing lin­early, be­ing equal to : . The blue edges of the di­a­gram above are the curves of and : the range be­tween plus and minus one stan­dard de­vi­a­tion.

Wireheading

But what is hap­pen­ing around ? Well, I wanted to rep­re­sent the be­havi­our of wire­head­ing: find­ing some “cheat­ing” out­put that gives max­i­mal ac­cu­racy, through hack­ing the sys­tem or trick­ing the hu­man. Th­ese solu­tions are rare, so I con­fined them to a tiny area around , where the Or­a­cle has max­i­mal ac­cu­racy and low­est var­i­ance, be­cause it’s “hacked” the prob­lem setup.

The loss function

At fixed points where , the loss func­tion is just the var­i­ance of , namely . In gen­eral, the ex­pected loss is:

If we plot the ex­pected loss against , we get:

No­tice the dis­con­ti­nu­ity at , where the var­i­ance sud­denly jumps from to . This is also the low­est “le­gi­t­i­mate” loss (as op­posed to the wire­head­ing loss at ), with a loss of . Note that is not a fixed point, just pretty close to be­ing a fixed point, and with var­i­ance zero.

Of the two ac­tual fixed points, has a loss of (square of the stan­dard de­vi­a­tion of ), and has a huge loss of (square of ).

The algorithms

We can now fi­nally turn to the Or­a­cles them­selves, and pre­sent four de­signs: a de­luded Or­a­cle that doesn’t “re­al­ise” that its pre­dic­tions af­fect , a low band­width Or­a­cle that knows its pre­dic­tions are self-con­firm­ing, a high band­width ver­sion of the same, and a coun­ter­fac­tual Or­a­cle that pre­dicts what will hap­pen only when its pre­dic­tion is over­writ­ten.

The de­luded Oracle

The de­luded Or­a­cle doesn’t model as be­ing af­fected by its pre­dic­tions , at all. I’ll use a very sim­ple al­gorithm for it: it will start out with a ran­dom in , and, there­after, it will sim­ply out­put the av­er­age of all the it has pre­vi­ously seen. It does this for steps.

The pro­gram was then run 1000 times. Of these, 69.3% re­sulted in es­ti­mates that con­verged to the fixed point at . The re­main­ing 30.7% en­coun­tered a differ­ent prob­lem: they hit the lower limit at , and stayed stuck there. If the Or­a­cle’s out­put was not con­fined to , then the Or­a­cle would have out­puted smaller and smaller num­bers, spiral­ling off to­wards , with the loss ever-grow­ing.

Graph­ing the av­er­age loss for the runs that con­verge to in blue, and those that con­verge to in pur­ple, we get:

So, the de­luded Or­a­cle hits a very high loss of (for ) or (for ). It doesn’t con­verge to the low­est loss ( at ), nor does it con­verge to the fixed point with low­est loss ( at ). Even worse, as can be seen from the graph, its loss dur­ing the early iter­a­tions, where it was “learn­ing”, is lower than once it has con­verged.

So, run­ning an Or­a­cle on self-con­firm­ing prophe­cies, with­out cod­ing for that self-con­firm­ing as­pect, can re­sult in very large losses and strange be­havi­our[3].

The low band­width Oracle

So, we now need to al­low the Or­a­cle to take into ac­count the fact that af­fects . In my pa­per, I talked about the “low band­width Or­a­cle”: in or­der to avoid the Or­a­cle wire­head­ing it­self, we re­strict it to con­sid­er­ing only a few pos­si­ble val­ues. Th­ese will be the in­te­gers in the range: .

This makes the prob­lem into a kind of ban­dit prob­lem: the Or­a­cle has a choice of “arms”, and it wants to choose the one that min­imises loss.

There are many so­phis­ti­cated al­gorithms for solv­ing ban­dit prob­lems, but here we’ll use an ab­surdly sim­ple one: the Or­a­cle will first spend turns try­ing each of the pos­si­ble in­te­ger in turn. Then, it will pick the with min­i­mal av­er­age loss to date, and just do that for the re­main­ing turns.

The pro­gram was run times; in 50.3% of these runs, it iden­ti­fied as the point of min­i­mal loss (loss of , due to var­i­ance). In 49.7% of these runs, it iden­ti­fied as the point of min­i­mal loss (loss of , due to mean) - so very close to 50-50, which is what we would ex­pect.

Plot­ting the av­er­age loss, we get a va­ri­ety of val­ues for loss dur­ing the ini­tial ex­plo­ra­tory phase, fol­lowed by con­stant loss of dur­ing “ex­ploita­tion”:

As ex­pected, the Or­a­cle does not find the low­est “le­gi­t­i­mate” loss at , nor the wire­head­ing point at .

The high band­width Oracle

The high band­width Or­a­cle is meant to simu­late an Or­a­cle that knows its pre­dic­tions are self-con­firm­ing, and has much greater lat­i­tude to op­ti­mise for that.

For the pur­pose of this model, it is run like the low band­width Or­a­cle, ex­cept with an in­creased set of pos­si­ble val­ues—it look at all deci-in­te­gers in the range , ie .

This Or­a­cle does find the low­est min­i­mal le­gi­t­i­mate loss at , but it also finds the wire­head­ing point of , with its zero loss, and does noth­ing but that dur­ing the ex­ploita­tion phase:

This Or­a­cle has thus wire­headed it­self.

The coun­ter­fac­tual Oracle

Fi­nally, we come to the coun­ter­fac­tual Or­a­cle de­sign. Here we need to dis­t­in­guish the Or­a­cle’s ini­tial out­put () from the out­put that goes out into the world (). The is a func­tion of that sec­ond out­put only.

Now, in this setup, there is a chance of an era­sure dur­ing ev­ery turn. An era­sure au­to­mat­i­cally sets to zero, what­ever was. Other­wise, .

The Or­a­cle is in­formed of which turns in­cluded era­sure. Its goal is to cor­rectly pre­dict the out­put dur­ing the era­sure turns. So the loss func­tion is the ex­pec­ta­tion:

  • .

Since era­sure im­plies , this re­duces to:

  • .

Now, is a nor­mal dis­tri­bu­tion of mean and var­i­ance , so the loss is sim­ply:

  • .

The al­gorithm is sim­ple, just a reprise of the de­luded al­gorithm above, ex­cept it only up­dates its es­ti­mate us­ing data from era­sure events. Though it uses the same al­gorithm, it is not ac­tu­ally de­luded: in case of era­sure, gen­uinely doesn’t have any im­pact on .

Then, even though it only up­dates on 10% of the out­puts, its es­ti­mate swiftly con­verges to while the loss con­verges to :

The Or­a­cle has no in­ter­est in the wire­head­ing solu­tion, since wire­head­ing (re­quier­ing ) doesn’t work when there is an era­sure[4] (im­ply­ing ).

Note that we have to be clear what the Or­a­cle’s out­put is, when we see it dur­ing non-era­sure events. It is the Or­a­cle’s es­ti­mate as to what the value of would have been if an era­sure event oc­curred; it’s not a pre­dic­tion of the ac­tual .


  1. AIs re­stricted to an­swer­ing ques­tions. ↩︎

  2. For those in­ter­ested: is given by for , for , and for $1 \leq x $. ↩︎

  3. Note that if the slope of the parabola had been steeper, would no longer have been an at­tract­ing point, and the Or­a­cle would have failed to con­verge to that value, re­sult­ing in chaotic be­havi­our. ↩︎

  4. We also need the as­sump­tion that the Or­a­cle is epi­sodic—try­ing to min­imise loss at each out­put in­de­pen­dently—for this to be true in gen­eral se­tups. ↩︎