Self-confirming predictions can be arbitrarily bad

Pre­dict­ing per­verse donors

There is a rich donor who is will­ing to donate up to £2,000,000 to your cause. They’ve already writ­ten a cheque for £1,000,000, but, be­fore they pre­sent it to you, they ask you to pre­dict how much they’ll be donat­ing.

The donor is slightly per­verse. If you pre­dict any amount £P, they’ll erase their cheque and write £(P-1) in­stead, one pound less than what your pre­dicted.

Then if you want your pre­dic­tion to be ac­cu­rate, there’s only one amount you can pre­dict: £P=£0, and you will in­deed get noth­ing.

Sup­pose the donor was per­verse in a more gen­er­ous way, and they’d in­stead write £(P+1), one more than your pre­dic­tion, up to their max­i­mum. In that case, the only ac­cu­rate guess is £P=£2,000,000, and you get the whole amount.

If we ex­tend the range above £2,000,000, or be­low £0 (maybe the donor is also a reg­u­la­tor, who can fine you) then the cor­rect pre­dic­tions get ever more ex­treme. It also doesn’t mat­ter if the donor sub­tracts or adds £1, £100, or one pence (£0.01): the only ac­cu­rate pre­dic­tions are at the ex­treme of the range.

Greek mythol­ogy is full of orac­u­lar pre­dic­tions that only hap­pened be­cause peo­ple took steps to avoid them. So there is a big differ­ence be­tween “pre­dic­tion P is true”, and “pre­dic­tion P is true even if P is gen­er­ally known”.

Con­ti­nu­ity assumption

A pre­dic­tion P is self-con­firm­ing if, once P is gen­er­ally known, then P will hap­pen (or P is the ex­pec­ta­tion of what will then hap­pen). The pre­vi­ous sec­tion has self-con­firm­ing pre­dic­tions, but these don’t always ex­ist. They ex­ist when the out­come is con­tin­u­ous in the pre­dic­tion P (and a few tech­ni­cal as­sump­tions, like the out­come tak­ing val­ues in a closed in­ter­val). If that as­sump­tion is vi­o­lated, then there need not be any self-con­firm­ing pre­dic­tion.

For ex­am­ple, the gen­er­ous donor could give £(P+1), ex­cept if you ask for too much (more than £1,999,999), in which case you get noth­ing. In that case, there is no cor­rect pre­dic­tion £P (the same goes for the £(P-1) donor who will give you the max­i­mum if you’re mod­est enough to ask for less than £1).

Pre­dic­tion feed­back loops

But the lack of self-con­firm­ing pre­dic­tion is not re­ally the big prob­lem. The big prob­lem is that, as you at­tempt to re­fine your pre­dic­tion (maybe you en­counter per­verse donors reg­u­larly), where you end up at will not be de­ter­mined by the back­ground facts of the world (the donor’s de­fault gen­eros­ity) but it will en­tirely be de­ter­mined by the feed­back loop with your pre­dic­tion. See here for a similar ex­am­ple in game the­ory.

Slop­pier pre­dic­tion are no better

One ob­vi­ous an­swer would be to al­low slop­pier pre­dic­tions. For ex­am­ple, if we re­quire that the pre­dic­tion be “within £1 of the true value”, then all val­ues be­tween £0 and £2,000,000 are equally valid; av­er­ag­ing those, we get £1,000,000, the same as would have hap­pened with­out the pre­dic­tion.

But that’s just a co­in­ci­dence. We could have con­structed the ex­am­ple so that only a cer­tain re­gion has “within £1” perfor­mance, while all oth­ers have “within £2″ perfor­mance. More damm­ingly, we could have defined “they’ve already writ­ten a cheque for £X” for ab­solutely any X, and it wouldn’t have changed any­thing. So there is no link be­tween the self-con­firm­ing pre­dic­tion and what would have hap­pened with­out pre­dic­tion. And mak­ing the self-con­firm­ing as­pect weaker won’t im­prove mat­ters.

Real-world dangers

How of­ten would sce­nar­ios like that hap­pen in the real world? The donor ex­am­ple is con­voluted, and feels very im­plau­si­ble; what kind of per­son is will­ing to donate around £1,000,000 if no pre­dic­tions are made, but sud­denly changes to £(P±1) if there is a pre­dic­tion?

Dona­tions nor­mally spring from bet­ter thought-out pro­cesses, in­volv­ing mul­ti­ple ac­tors, for spe­cific pur­poses (helping the world, in­creas­ing a cer­tain sub­cul­ture or value, PR...). They are not nor­mally so sen­si­tive to pre­dic­tions. And though there are cases where there are true self-con­firm­ing or self-fulfilling pre­dic­tions (no­tably in poli­tics), these tend to be ar­eas which are pretty close to a knife-edge any­way, and could have gone in mul­ti­ple di­rec­tions, with the pre­dic­tion giv­ing them a small nudge in one di­rec­tion.

So, though in the­ory there is no con­nec­tion be­tween a self-con­firm­ing pre­dic­tion and what would have hap­pened if the pre­dic­tion had not been ut­tered, it seems that in prac­tice they are not too far apart (for ex­am­ple, no donor can donate more money than they have, and they gen­er­ally have their dona­tion amount pretty fixed).

Though be­ware pre­dic­tion like “what’s the value of the most un­der­val­ued/​over­val­ued stock on this ex­change”, where know­ing pre­dic­tions will af­fect be­havi­our quite ex­ten­sively. That is a spe­cial case of the next sec­tion; the “new ap­proach” the pre­dic­tion sug­gests is “buy/​sell these stocks”.

Pre­dic­tions caus­ing new approaches

There is one area where it is very plau­si­ble for a pre­dic­tion to cause a huge effect, though, and that’s when the pre­dic­tion sug­gests the pos­si­bil­ities of new ap­proaches. Sup­pose I’m run­ning a mil­lion-dol­lar com­pany with a hun­dred thou­sand dol­lars in yearly profit., and ask a smart AI to pre­dict my ex­pected profit next year. The AI an­swers zero.

At that point, I’d be re­ally tempted to give up, and go home (or in­vest/​start a new com­pany in a differ­ent area). The AI has fore­seen some ma­jor prob­lem, mak­ing my work use­less. So I’d give up, and the com­pany folds, thus con­firm­ing the pre­dic­tion.

Or maybe the AI would pre­dict ten mil­lion dol­lars of profit. What? Ten times more than the cur­rent cap­i­tal­i­sa­tion of the com­pany? Some­thing strange is go­ing on. So I sift through the com­pany’s pro­jects with great care. Most of them are solid and stolid, but one looks like a mas­sive-risk-mas­sive-re­ward gam­ble. I can­cel all the other pro­jects, and put ev­ery­thing into that, be­cause that is the only sce­nario where I see ten mil­lion dol­lar prof­its be­ing pos­si­ble. And, with the un­ex­pected new fi­nanc­ing, the pro­ject takes off.

There are some more ex­otic sce­nar­ios, like an AI that pre­dicts £192,116,518,914.20 profit. Separat­ing that as 19;21;16;5;18;9;14;20 and re­plac­ing num­bers with let­ters, this is is SUPERINT: the AI is ad­vis­ing me to build a su­per­in­tel­li­gence, which, if I do, will grant me ex­actly the re­quired profit to make that pre­dic­tion true in ex­pec­ta­tion (and af­ter that… well, then bad things might hap­pen). Note that the AI need not be mal­i­cious; if it’s smart enough and has good enough mod­els, it might re­al­ise that £192,116,518,914.20 is self-con­firm­ing, with­out “aiming” to con­struct a su­per­in­tel­li­gence.

All these ex­am­ples share the fea­ture that the pre­dic­tion P causes a great change in be­havi­our. Our in­tu­itions that out­come-with-P and out­come-with­out-P should be similar, is based on the idea that P does not change be­havi­our much.

Ex­otic corners

Part of the rea­son that AIs could be so pow­er­ful is that they could un­lock new cor­ners of strat­egy space, do­ing things that are in­con­ceiv­able to us, to achieve ob­jec­tives in ways we didn’t think was pos­si­ble.

A pre­dict­ing AI is more con­strained than that, be­cause it can’t act di­rectly. But it can act in­di­rectly, with its pre­dic­tion caus­ing us to un­lock new cor­ners of strat­egy space.

Would a purely pre­dic­tive AI do that? Well, it de­pends on two things:

  1. How self-con­firm­ing the ex­otic cor­ners are, com­pared with more mun­dane pre­dic­tions, and

  2. Whether the AI could ex­plore these cor­ners suffi­ciently well to come up with self-con­firm­ing pre­dic­tion in them.

For 1, it’s very hard to tell; af­ter all, in the ex­am­ple of this post and in the game-the­ory ex­am­ple, ar­bi­trar­ily tiny mis­al­ign­ment at stan­dard out­comes, can push the self-con­firm­ing out­come ar­bi­trar­ily far into the ex­otic area. I’d be ner­vous about trust­ing our in­tu­itions here, be­cause ap­prox­i­ma­tions don’t help us. And the Quine-like “P causes the pro­duc­tion of a su­per­pow­ered AI that causes P to be true” seems like a perfect and ex­act ex­otic self-con­firm­ing pre­dic­tion that works in al­most all ar­eas.

What about 2? Well, that’s a prac­ti­cal bar­rier for many de­signs. If the AI is a sim­ple se­quence pre­dic­tor with­out a good world-model, it might not be able to re­al­ise that there are ex­otic self-con­firm­ing pre­dic­tions. A pre­dic­tor that had been giv­ing stan­dard stock mar­ket pre­dic­tions for all of its ex­is­tence, is un­likely to sud­denly hit on a highly ma­nipu­la­tive pre­dic­tion.

But I fear sce­nar­ios where the AI grad­u­ally learns how to ma­nipu­late us. After all, even for stan­dard sce­nar­ios, we will change our be­havi­our a bit, based on the pre­dic­tion. The AI will learn to give the most self-con­firm­ing of these stan­dard pre­dic­tions, and so will grad­u­ally build up ex­pe­rience in ma­nipu­lat­ing us effec­tively (in par­tic­u­lar, I’d ex­pect the “zero profit pre­dicted → stock­hold­ers close the com­pany” to be­come quite stan­dard). The amount of ma­nipu­la­tion may grow slowly, un­til the AI has a re­ally good un­der­stand­ing of how to deal with the hu­man part of the en­vi­ron­ment, and the ex­otic ma­nipu­la­tions are just a con­tinu­a­tion of what it’s already been do­ing.