# Pre­dict­ing per­verse donors

There is a rich donor who is will­ing to donate up to £2,000,000 to your cause. They’ve already writ­ten a cheque for £1,000,000, but, be­fore they pre­sent it to you, they ask you to pre­dict how much they’ll be donat­ing.

The donor is slightly per­verse. If you pre­dict any amount £P, they’ll erase their cheque and write £(P-1) in­stead, one pound less than what your pre­dicted.

Then if you want your pre­dic­tion to be ac­cu­rate, there’s only one amount you can pre­dict: £P=£0, and you will in­deed get noth­ing.

Sup­pose the donor was per­verse in a more gen­er­ous way, and they’d in­stead write £(P+1), one more than your pre­dic­tion, up to their max­i­mum. In that case, the only ac­cu­rate guess is £P=£2,000,000, and you get the whole amount.

If we ex­tend the range above £2,000,000, or be­low £0 (maybe the donor is also a reg­u­la­tor, who can fine you) then the cor­rect pre­dic­tions get ever more ex­treme. It also doesn’t mat­ter if the donor sub­tracts or adds £1, £100, or one pence (£0.01): the only ac­cu­rate pre­dic­tions are at the ex­treme of the range.

Greek mythol­ogy is full of orac­u­lar pre­dic­tions that only hap­pened be­cause peo­ple took steps to avoid them. So there is a big differ­ence be­tween “pre­dic­tion P is true”, and “pre­dic­tion P is true even if P is gen­er­ally known”.

## Con­ti­nu­ity assumption

A pre­dic­tion P is self-con­firm­ing if, once P is gen­er­ally known, then P will hap­pen (or P is the ex­pec­ta­tion of what will then hap­pen). The pre­vi­ous sec­tion has self-con­firm­ing pre­dic­tions, but these don’t always ex­ist. They ex­ist when the out­come is con­tin­u­ous in the pre­dic­tion P (and a few tech­ni­cal as­sump­tions, like the out­come tak­ing val­ues in a closed in­ter­val). If that as­sump­tion is vi­o­lated, then there need not be any self-con­firm­ing pre­dic­tion.

For ex­am­ple, the gen­er­ous donor could give £(P+1), ex­cept if you ask for too much (more than £1,999,999), in which case you get noth­ing. In that case, there is no cor­rect pre­dic­tion £P (the same goes for the £(P-1) donor who will give you the max­i­mum if you’re mod­est enough to ask for less than £1).

## Pre­dic­tion feed­back loops

But the lack of self-con­firm­ing pre­dic­tion is not re­ally the big prob­lem. The big prob­lem is that, as you at­tempt to re­fine your pre­dic­tion (maybe you en­counter per­verse donors reg­u­larly), where you end up at will not be de­ter­mined by the back­ground facts of the world (the donor’s de­fault gen­eros­ity) but it will en­tirely be de­ter­mined by the feed­back loop with your pre­dic­tion. See here for a similar ex­am­ple in game the­ory.

## Slop­pier pre­dic­tion are no better

One ob­vi­ous an­swer would be to al­low slop­pier pre­dic­tions. For ex­am­ple, if we re­quire that the pre­dic­tion be “within £1 of the true value”, then all val­ues be­tween £0 and £2,000,000 are equally valid; av­er­ag­ing those, we get £1,000,000, the same as would have hap­pened with­out the pre­dic­tion.

But that’s just a co­in­ci­dence. We could have con­structed the ex­am­ple so that only a cer­tain re­gion has “within £1” perfor­mance, while all oth­ers have “within £2″ perfor­mance. More damm­ingly, we could have defined “they’ve already writ­ten a cheque for £X” for ab­solutely any X, and it wouldn’t have changed any­thing. So there is no link be­tween the self-con­firm­ing pre­dic­tion and what would have hap­pened with­out pre­dic­tion. And mak­ing the self-con­firm­ing as­pect weaker won’t im­prove mat­ters.

# Real-world dangers

How of­ten would sce­nar­ios like that hap­pen in the real world? The donor ex­am­ple is con­voluted, and feels very im­plau­si­ble; what kind of per­son is will­ing to donate around £1,000,000 if no pre­dic­tions are made, but sud­denly changes to £(P±1) if there is a pre­dic­tion?

Dona­tions nor­mally spring from bet­ter thought-out pro­cesses, in­volv­ing mul­ti­ple ac­tors, for spe­cific pur­poses (helping the world, in­creas­ing a cer­tain sub­cul­ture or value, PR...). They are not nor­mally so sen­si­tive to pre­dic­tions. And though there are cases where there are true self-con­firm­ing or self-fulfilling pre­dic­tions (no­tably in poli­tics), these tend to be ar­eas which are pretty close to a knife-edge any­way, and could have gone in mul­ti­ple di­rec­tions, with the pre­dic­tion giv­ing them a small nudge in one di­rec­tion.

So, though in the­ory there is no con­nec­tion be­tween a self-con­firm­ing pre­dic­tion and what would have hap­pened if the pre­dic­tion had not been ut­tered, it seems that in prac­tice they are not too far apart (for ex­am­ple, no donor can donate more money than they have, and they gen­er­ally have their dona­tion amount pretty fixed).

Though be­ware pre­dic­tion like “what’s the value of the most un­der­val­ued/​over­val­ued stock on this ex­change”, where know­ing pre­dic­tions will af­fect be­havi­our quite ex­ten­sively. That is a spe­cial case of the next sec­tion; the “new ap­proach” the pre­dic­tion sug­gests is “buy/​sell these stocks”.

## Pre­dic­tions caus­ing new approaches

There is one area where it is very plau­si­ble for a pre­dic­tion to cause a huge effect, though, and that’s when the pre­dic­tion sug­gests the pos­si­bil­ities of new ap­proaches. Sup­pose I’m run­ning a mil­lion-dol­lar com­pany with a hun­dred thou­sand dol­lars in yearly profit., and ask a smart AI to pre­dict my ex­pected profit next year. The AI an­swers zero.

At that point, I’d be re­ally tempted to give up, and go home (or in­vest/​start a new com­pany in a differ­ent area). The AI has fore­seen some ma­jor prob­lem, mak­ing my work use­less. So I’d give up, and the com­pany folds, thus con­firm­ing the pre­dic­tion.

Or maybe the AI would pre­dict ten mil­lion dol­lars of profit. What? Ten times more than the cur­rent cap­i­tal­i­sa­tion of the com­pany? Some­thing strange is go­ing on. So I sift through the com­pany’s pro­jects with great care. Most of them are solid and stolid, but one looks like a mas­sive-risk-mas­sive-re­ward gam­ble. I can­cel all the other pro­jects, and put ev­ery­thing into that, be­cause that is the only sce­nario where I see ten mil­lion dol­lar prof­its be­ing pos­si­ble. And, with the un­ex­pected new fi­nanc­ing, the pro­ject takes off.

There are some more ex­otic sce­nar­ios, like an AI that pre­dicts £192,116,518,914.20 profit. Separat­ing that as 19;21;16;5;18;9;14;20 and re­plac­ing num­bers with let­ters, this is is SUPERINT: the AI is ad­vis­ing me to build a su­per­in­tel­li­gence, which, if I do, will grant me ex­actly the re­quired profit to make that pre­dic­tion true in ex­pec­ta­tion (and af­ter that… well, then bad things might hap­pen). Note that the AI need not be mal­i­cious; if it’s smart enough and has good enough mod­els, it might re­al­ise that £192,116,518,914.20 is self-con­firm­ing, with­out “aiming” to con­struct a su­per­in­tel­li­gence.

All these ex­am­ples share the fea­ture that the pre­dic­tion P causes a great change in be­havi­our. Our in­tu­itions that out­come-with-P and out­come-with­out-P should be similar, is based on the idea that P does not change be­havi­our much.

## Ex­otic corners

Part of the rea­son that AIs could be so pow­er­ful is that they could un­lock new cor­ners of strat­egy space, do­ing things that are in­con­ceiv­able to us, to achieve ob­jec­tives in ways we didn’t think was pos­si­ble.

A pre­dict­ing AI is more con­strained than that, be­cause it can’t act di­rectly. But it can act in­di­rectly, with its pre­dic­tion caus­ing us to un­lock new cor­ners of strat­egy space.

Would a purely pre­dic­tive AI do that? Well, it de­pends on two things:

1. How self-con­firm­ing the ex­otic cor­ners are, com­pared with more mun­dane pre­dic­tions, and

2. Whether the AI could ex­plore these cor­ners suffi­ciently well to come up with self-con­firm­ing pre­dic­tion in them.

For 1, it’s very hard to tell; af­ter all, in the ex­am­ple of this post and in the game-the­ory ex­am­ple, ar­bi­trar­ily tiny mis­al­ign­ment at stan­dard out­comes, can push the self-con­firm­ing out­come ar­bi­trar­ily far into the ex­otic area. I’d be ner­vous about trust­ing our in­tu­itions here, be­cause ap­prox­i­ma­tions don’t help us. And the Quine-like “P causes the pro­duc­tion of a su­per­pow­ered AI that causes P to be true” seems like a perfect and ex­act ex­otic self-con­firm­ing pre­dic­tion that works in al­most all ar­eas.

What about 2? Well, that’s a prac­ti­cal bar­rier for many de­signs. If the AI is a sim­ple se­quence pre­dic­tor with­out a good world-model, it might not be able to re­al­ise that there are ex­otic self-con­firm­ing pre­dic­tions. A pre­dic­tor that had been giv­ing stan­dard stock mar­ket pre­dic­tions for all of its ex­is­tence, is un­likely to sud­denly hit on a highly ma­nipu­la­tive pre­dic­tion.

But I fear sce­nar­ios where the AI grad­u­ally learns how to ma­nipu­late us. After all, even for stan­dard sce­nar­ios, we will change our be­havi­our a bit, based on the pre­dic­tion. The AI will learn to give the most self-con­firm­ing of these stan­dard pre­dic­tions, and so will grad­u­ally build up ex­pe­rience in ma­nipu­lat­ing us effec­tively (in par­tic­u­lar, I’d ex­pect the “zero profit pre­dicted → stock­hold­ers close the com­pany” to be­come quite stan­dard). The amount of ma­nipu­la­tion may grow slowly, un­til the AI has a re­ally good un­der­stand­ing of how to deal with the hu­man part of the en­vi­ron­ment, and the ex­otic ma­nipu­la­tions are just a con­tinu­a­tion of what it’s already been do­ing.

• It seems that we want is usu­ally go­ing to be a coun­ter­fac­tual pre­dic­tion: what would hap­pen if the AI gave no out­put, or gave some bor­ing de­fault pre­dic­tion. This is com­pu­ta­tion­ally sim­pler, but philo­soph­i­cally tri­ciker. It also re­quires that we be the sort of agents who won’t act too strangely if we find our­selves in the coun­ter­fac­tual world in­stead of the real one.

• I think I’m miss­ing some­thing about your per­verse donor ex­am­ple. What makes your num­ber a pre­dic­tion rather than just a prefer­ence? If they’re go­ing to erase the (mean­ingless) 1M and give you P-1, you just max­i­mize P, right? A pre­dic­tion is just a stated be­lief, and if it’s not pay­ing rent in con­di­tional fu­ture ex­pe­rience, it’s prob­a­bly not worth hav­ing.

More gen­er­ally, is the self-con­firm­ing pre­dic­tion just the same as a con­di­tional prob­a­bil­ity on a not-re­vealed-by-the-or­a­cle con­di­tion? In what cases will the or­a­cle NOT want to re­veal the con­di­tion? In this case, the na­ture of ad­ver­sar­ial goals needs to be ex­am­ined—why wouldn’t the or­a­cle just falsify the pre­dic­tion in ad­di­tion to hid­ing the con­di­tions?

Also, I’m not sure where the “con­tin­u­ous” re­quire­ment comes from. Your ex­am­ple isn’t con­tin­u­ous, only whole pen­nies are al­lowed. Even if only prime mul­ti­ples of 3 were al­lowed, it would seem the same les­son holds.

Separately (and minor), I’m not en­joy­ing the “can be ar­bi­trar­ily bad” ti­tles. They don’t con­vey in­for­ma­tion, and con­fuse me into think­ing the posts are about some­thing more fun­da­men­tal than they seem to be. _ANY_ ar­bi­trary sce­nario can be ar­bi­trar­ily bad, why are these top­ics spe­cial on that front?

• A self-con­firm­ing pre­dic­tion is what an or­a­cle that was a naive se­quence pre­dic­tor (or that was re­warded on re­sults) would give. https://​​www.less­wrong.com/​​posts/​​i2dNFg­b­jn­qZBfeitT/​​or­a­cles-se­quence-pre­dic­tors-and-self-con­firm­ing-predictions

The donor ex­am­ple was to show how such a pre­dic­tor could end up mov­ing you far in the pos­i­tive or nega­tive di­rec­tion. If you were op­ti­mis­ing for in­come rather than ac­cu­racy, the choice is ob­vi­ous.

The £(P±1) is a con­tin­u­ous model of a dis­con­tin­u­ous re­al­ity. The model has a self-con­firm­ing pre­dic­tion, and it turns out “re­al­ity” (the dis­cre­tised ver­sion) has one too. Un­less deriva­tives get ex­tremely high, a con­tin­u­ous model im­plies a self-con­firm­ing pre­dic­tion im­plies a close-to-self-con­firm­ing pre­dic­tion in the dis­cre­tised model.

• I think I’m still con­fused—a naive se­quence pre­dic­tor is _OF COURSE_ bro­ken by per­verse or ad­ver­sar­ial un­mod­el­led (be­cause of the naivety of the pre­dic­tor) be­hav­iors. And such a pre­dic­tor can­not un­lock new cor­ners of strat­egy space, or gen­er­ate self-re­in­forc­ing pre­dic­tions, be­cause the past se­quence on which it’s trained won’t have those fea­tures.

• And such a pre­dic­tor can­not un­lock new cor­ners of strat­egy space, or gen­er­ate self-re­in­forc­ing pre­dic­tions, be­cause the past se­quence on which it’s trained won’t have those fea­tures.

See my last para­graph above; I don’t think we can rely on pre­dic­tors not un­lock­ing new cor­ners of strat­egy space, be­cause it may be able to learn grad­u­ally how to do so.

• There’s a cool name for this donor’s ac­tion: blindspot­ting (yeah, it’s writ­ten like this) - af­ter a Roy Sorensen book from 1988.

• In that case, there is no cor­rect pre­dic­tion £P

But there is a dis­tance be­tween pre­dic­tions and re­sults, which is greater for some pre­dic­tions.

• If you want to avoid chang­ing dis­tances, set the out­come as £P+1 for P less than a mil­lion, and £P-1 for P greater than or equal to a mil­lion (for ex­am­ple).

• I am not sure what ex­actly you are mean­ing by pre­dict­ing. You can tell the donor a differ­ent amount, than you are in­ter­nally ex­pect­ing to ob­tain.

• The post con­cerns self-con­firm­ing pre­dic­tions. The donor asked for a pre­dic­tion of how much money they’ll give you...af­ter they hear your pre­dic­tion. A pre­dic­tion you give them would be “self-con­firm­ing” if they gave you the amount you speci­fied. Here “pre­dic­tion” refers to “the amount you tell them”, as op­posed to the amount

you are in­ter­nally ex­pect­ing to ob­tain.

which no one other than you ac­tu­ally knows.