The Parable of Predict-O-Matic

I’ve been think­ing more about par­tial agency. I want to ex­pand on some is­sues brought up in the com­ments to my pre­vi­ous post, and on other com­pli­ca­tions which I’ve been think­ing about. But for now, a more in­for­mal parable. (Mainly be­cause this is eas­ier to write than my more tech­ni­cal thoughts.)

This re­lates to or­a­cle AI and to in­ner op­ti­miz­ers, but my fo­cus is a lit­tle differ­ent.


Sup­pose you are de­sign­ing a new in­ven­tion, a pre­dict-o-matic. It is a won­der­ous ma­chine which will pre­dict ev­ery­thing for us: weather, poli­tics, the newest ad­vances in quan­tum physics, you name it. The ma­chine isn’t in­fal­lible, but it will in­te­grate data across a wide range of do­mains, au­to­mat­i­cally keep­ing it­self up-to-date with all ar­eas of sci­ence and cur­rent events. You fully ex­pect that once your product goes live, it will be­come a house­hold util­ity, re­plac­ing ser­vices like Google. (Google only lets you search the known!)

Things are go­ing well. You’ve got in­vestors. You have an office and a staff. Th­ese days, it hardly even feels like a start-up any more; progress is go­ing well.

One day, an in­tern raises a con­cern.

“If ev­ery­one is go­ing to be us­ing Pre­dict-O-Matic, we can’t think of it as a pas­sive ob­server. Its an­swers will shape events. If it says stocks will rise, they’ll rise. If it says stocks will fall, then fall they will. Many peo­ple will vote based on its pre­dic­tions.”

“Yes,” you say, “but Pre­dict-O-Matic is an im­par­tial ob­server nonethe­less. It will an­swer peo­ple’s ques­tions as best it can, and they re­act how­ever they will.”

“But—” the in­tern ob­jects—“Pre­dict-O-Matic will see those pos­si­ble re­ac­tions. It knows it could give sev­eral differ­ent valid pre­dic­tions, and differ­ent pre­dic­tions re­sult in differ­ent fu­tures. It has to de­cide which one to give some­how.”

You tap on your desk in thought for a few sec­onds. “That’s true. But we can still keep it ob­jec­tive. It could pick ran­domly.”

“Ran­domly? But some of these will be huge is­sues! Com­pa­nies—no, na­tions—will one day rise or fall based on the word of Pre­dict-O-Matic. When Pre­dict-O-Matic is mak­ing a pre­dic­tion, it is choos­ing a fu­ture for us. We can’t leave that to a coin flip! We have to se­lect the pre­dic­tion which re­sults in the best over­all fu­ture. For­get be­ing an im­pas­sive ob­server! We need to teach Pre­dict-O-Matic hu­man val­ues!”

You think about this. The thought of Pre­dict-O-Matic de­liber­ately steer­ing the fu­ture sends a shud­der down your spine. But what al­ter­na­tive do you have? The in­tern isn’t sug­gest­ing Pre­dict-O-Matic should lie, or bend the truth in any way—it an­swers 100% hon­estly to the best of its abil­ity. But (you re­al­ize with a sink­ing feel­ing) hon­esty still leaves a lot of wig­gle room, and the con­se­quences of wig­gles could be huge.

After a long silence, you meet the in­terns eyes. “Look. Peo­ple have to trust Pre­dict-O-Matic. And I don’t just mean they have to be­lieve Pre­dict-O-Matic. They’re bring­ing this thing into their homes. They have to trust that Pre­dict-O-Matic is some­thing they should be listen­ing to. We can’t build value judge­ments into this thing! If it ever came out that we had coded a value func­tion into Pre­dict-O-Matic, a value func­tion which se­lected the very fu­ture it­self by se­lect­ing which pre­dic­tions to make—we’d be done for! No mat­ter how hon­est Pre­dict-O-Matic re­mained, it would be seen as a ma­nipu­la­tor. No mat­ter how benefi­cent its guid­ing hand, there are always com­pro­mises, down­sides, ques­tion­able calls. No mat­ter how care­ful we were to set up its val­ues—to make them moral, to make them hu­man­i­tar­ian, to make them poli­ti­cally cor­rect and broadly ap­peal­ing—who are we to choose? No. We’d be done for. They’d hang us. We’d be toast!”

You re­al­ize at this point that you’ve stood up and started shout­ing. You com­pose your­self and sit back down.

“But—” the in­tern con­tinues, a lit­tle more meekly—“You can’t just ig­nore it. The sys­tem is faced with these choices. It still has to deal with it some­how.”

A look of de­ter­mi­na­tion crosses your face. “Pre­dict-O-Matic will be ob­jec­tive. It is a ma­chine of pre­dic­tion, is it not? Its ev­ery cog and wheel is set to that task. So, the an­swer is sim­ple: it will make whichever an­swer min­i­mizes pro­jected pre­dic­tive er­ror. There will be no ex­act ties; the statis­tics are always messy enough to see to that. And, if there are, it will choose alpha­bet­i­cally.”


You see the in­tern out of your office.


You are an in­tern at Pre­dic­tCorp. You have just had a dis­con­cert­ing con­ver­sa­tion with your boss, Pre­dic­tCorp’s founder.

You try to fo­cus on your work: build­ing one of Pre­dict-O-Matic’s many data-source-slurp­ing mod­ules. (You are try­ing to scrape in­for­ma­tion from some­thing called “arxiv” which you’ve never heard of be­fore.) But, you can’t fo­cus.

Whichever an­swer min­i­mizes pre­dic­tion er­ror? First you think it isn’t so bad. You imag­ine Pre­dict-O-Matic always fore­cast­ing that stock prices will be fairly sta­ble; no big crashes or booms. You imag­ine its fore­casts will fa­vor mid­dle-of-the-road poli­ti­ci­ans. You even imag­ine mild weather—weather fore­casts them­selves don’t in­fluence the weather much, but surely the col­lec­tive effect of all Pre­dict-O-Matic de­ci­sions will have some in­fluence on weather pat­terns.

But, you keep think­ing. Will mid­dle-of-the-road eco­nomics and poli­tics re­ally be the eas­iest to pre­dict? Maybe it’s bet­ter to strate­gi­cally re­move a wild­card com­pany or two, by giv­ing fore­casts which tank their stock prices. Maybe ex­trem­ist poli­tics are more pre­dictable. Maybe a well-run­ning econ­omy gives peo­ple more free­dom to take un­ex­pected ac­tions.

You keep think­ing of the line from Or­well’s 1984 about the boot stamp­ing on the hu­man face for­ever, ex­cept it isn’t be­cause of poli­tics, or spite, or some ugly fea­ture of hu­man na­ture, it’s be­cause a boot stamp­ing on a face for­ever is a nice re­li­able out­come which min­i­mizes pre­dic­tion er­ror.

Is that re­ally some­thing Pre­dict-O-Matic would do, though? Maybe you mi­s­un­der­stood. The phrase “min­i­mize pre­dic­tion er­ror” makes you think of en­tropy for some rea­son. Or maybe in­for­ma­tion? You always get those two con­fused. Is one sup­posed to be the nega­tive of the other or some­thing? You shake your head.

Maybe your boss was right. Maybe you don’t un­der­stand this stuff very well. Maybe when the in­ven­tor of Pre­dict-O-Matic and founder of Pre­dic­tCorp said “it will make whichever an­swer min­i­mizes pro­jected pre­dic­tive er­ror” they weren’t sug­gest­ing some­thing which would liter­ally kill all hu­mans just to stop the ruckus.

You might be able to clear all this up by ask­ing one of the en­g­ineers.


You are an en­g­ineer at Pre­dic­tCorp. You don’t have an office. You have a cu­bi­cle. This is rele­vant be­cause it means in­terns can walk up to you and ask stupid ques­tions about whether en­tropy is nega­tive in­for­ma­tion.

Yet, some deep-seated in­stinct makes you try to be friendly. And it’s lunch time any­way, so, you offer to ex­plain it over sand­wiches at a nearby cafe.

“So, Pre­dict-O-Matic max­i­mizes pre­dic­tive ac­cu­racy, right?” After a few min­utes of re­view about how log­a­r­ithms work, the in­tern started steer­ing the con­ver­sa­tion to­ward de­tails of Pre­dict-O-Matic.

“Sure,” you say, “Max­i­mize is a strong word, but it op­ti­mizes pre­dic­tive ac­cu­racy. You can ac­tu­ally think about that in terms of log loss, which is re­lated to in­for—”

“So I was won­der­ing,” the in­tern cuts you off, “does that work in both di­rec­tions?”

“How do you mean?”

“Well, you know, you’re op­ti­miz­ing for ac­cu­racy, right? So that means two things. You can change your pre­dic­tion to have a bet­ter chance of match­ing the data, or, you can change the data to bet­ter match your pre­dic­tion.”

You laugh. “Yeah, well, the Pre­dict-O-Matic isn’t re­ally in a po­si­tion to change data that’s sit­ting on the hard drive.”

“Right,” says the in­tern, ap­par­ently un­de­terred, “but what about data that’s not on the hard drive yet? You’ve done some live user tests. Pre­dict-O-Matic col­lects data on the user while they’re in­ter­act­ing. The user might ask Pre­dict-O-Matic what gro­ceries they’re likely to use for the fol­low­ing week, to help put to­gether a shop­ping list. But then, the an­swer Pre­dict-O-Matic gives will have a big effect on what gro­ceries they re­ally do use.”

“So?” You ask. “Pre­dict-O-Matic just tries to be as ac­cu­rate as pos­si­ble given that.”

“Right, right. But that’s the point. The sys­tem has a chance to ma­nipu­late users to be more pre­dictable.”

You drum your fingers on the table. “I think I see the mi­s­un­der­stand­ing here. It’s this word, op­ti­mize. It isn’t some kind of mag­i­cal thing that makes num­bers big­ger. And you shouldn’t think of it as a per­son try­ing to ac­com­plish some­thing. See, when Pre­dict-O-Matic makes an er­ror, an op­ti­miza­tion al­gorithm makes changes within Pre­dict-O-Matic to make it learn from that. So over time, Pre­dict-O-Matic makes fewer er­rors.”

The in­tern puts on a think­ing face with scrunched up eye­brows af­ter that, and we finish our sand­wiches in silence. Fi­nally, as the two of you get up to go, they say: “I don’t think that re­ally an­swered my ques­tion. The learn­ing al­gorithm is op­ti­miz­ing Pre­dict-O-Matic, OK. But then in the end you get a strat­egy, right? A strat­egy for an­swer­ing ques­tions. And the strat­egy is try­ing to do some­thing. I’m not an­thro­po­mor­phis­ing!” The in­tern holds up their hands as if to defend phys­i­cally against your ob­jec­tion. “My ques­tion is, this strat­egy it learns, will it ma­nipu­late the user? If it can get higher pre­dic­tive ac­cu­racy that way?”

“Hmm” you say as the two of you walk back to work. You meant to say more than that, but you haven’t re­ally thought about things this way be­fore. You promise to think about it more, and get back to work.


“It’s like how ev­ery­one com­plains that poli­ti­ci­ans can’t see past the next elec­tion cy­cle,” you say. You are an eco­nomics pro­fes­sor at a lo­cal uni­ver­sity. Your spouse is an en­g­ineer at Pre­dic­tCorp, and came home talk­ing about a prob­lem at work that you can un­der­stand, which is always fun.

“The poli­ti­ci­ans can’t have a real plan that stretches be­yond an elec­tion cy­cle be­cause the vot­ers are watch­ing their perfor­mance this cy­cle. Sacri­fic­ing some­thing to­day for the sake of to­mor­row means they un­der­perform to­day. Un­der­perform­ing means a com­peti­tor can un­der­cut you. So you have to sac­ri­fice all the to­mor­rows for the sake of to­day.”

“Un­der­cut?” your spouse asks. “Poli­tics isn’t eco­nomics, dear. Can’t you just ex­plain to your vot­ers?”

“It’s the same prin­ci­ple, dear. Vot­ers pay at­ten­tion to re­sults. Your com­peti­tor points out your un­der-perfor­mance. Some vot­ers will un­der­stand, but it’s an ideal­ized model; pre­tend the vot­ers just vote based on met­rics.”

“Ok, but I still don’t see how a ‘com­peti­tor’ can always ‘un­der­cut’ you. How do the vot­ers know that the other poli­ti­cian would have had bet­ter met­rics?”

“Alright, think of it like this. You run the gov­ern­ment like a cor­po­ra­tion, but you have just one share, which you auc­tion off—”

“That’s nei­ther like a gov­ern­ment nor like a cor­po­ra­tion.”

“Shut up, this is my new anal­ogy.” You smile. “It’s called a de­ci­sion mar­ket. You want peo­ple to make de­ci­sions for you. So you auc­tion off this share. Who­ever gets con­trol of the share gets con­trol of the com­pany for one year, and gets div­i­dends based on how well the com­pany did that year. As­sume the play­ers are bid­ding ra­tio­nally. Each per­son bids based on what they ex­pect they could make. So the high­est bid­der is the per­son who can run the com­pany the best, and they can’t be out-bid. So, you get the best pos­si­ble per­son to run your com­pany, and they’re in­cen­tivized to do their best, so that they get the most money at the end of the year. Ex­cept you can’t have any strate­gies which take longer than a year to show re­sults! If some­one had a strat­egy that took two years, they would have to over-bid in the first year, tak­ing a loss. But then they have to un­der-bid on the sec­ond year if they’re go­ing to make a profit, and—”

“And they get un­der­cut, be­cause some­one figures them out.”

“Right! Now you’re think­ing like an economist!”

“Wait, what if two peo­ple co­op­er­ate across years? Maybe we can get a good strat­egy go­ing if we split the gains.”

“You’ll get un­der­cut for the same rea­son one per­son would.”

“But what if-”


After that, things de­volve into a pillow fight.


“So, Pre­dict-O-Matic doesn’t learn to ma­nipu­late users, be­cause if it were us­ing a strat­egy like that, a com­pet­ing strat­egy could un­der­cut it.”

The in­tern is talk­ing to the en­g­ineer as you walk up to the wa­ter cooler. You’re the ac­coun­tant.

“I don’t re­ally get it. Why does it get un­der­cut?”

“Well, if you have a two-year plan..”

“I get that ex­am­ple, but Pre­dict-O-Matic doesn’t work like that, right? It isn’t se­quen­tial pre­dic­tion. You don’t see the ob­ser­va­tion right af­ter the pre­dic­tion. I can ask Pre­dict-O-Matic about the weather 100 years from now. So things aren’t cleanly sep­a­rated into terms of office where one strat­egy does some­thing and then gets a re­ward.”

“I don’t think that mat­ters,” the en­g­ineer says. “One ques­tion, one an­swer, one re­ward. When the sys­tem learns whether its an­swer was ac­cu­rate, no mat­ter how long it takes, it up­dates strate­gies re­lat­ing to that one an­swer alone. It’s just a de­layed pay­out on the div­i­dends.”

“Ok, yeah. Ok.” The in­tern drinks some wa­ter. “But. I see why you can un­der­cut strate­gies which take a loss on one an­swer to try and get an ad­van­tage on an­other an­swer. So it won’t lie to you to ma­nipu­late you.”

“I for one wel­come our new robot over­lords,” you but in. They ig­nore you.

“But what I was re­ally wor­ried about was self-fulfilling prophe­cies. The pre­dic­tion ma­nipu­lates its own an­swer. So you don’t get un­der­cut.”

“Will that ever re­ally be a prob­lem? Ma­nipu­lat­ing things with one shot like that seems pretty un­re­al­is­tic,” the en­g­ineer says.

“Ah, self-fulfilling prophe­cies, good stuff” you say. “There’s that fa­mous ex­am­ple where a co­me­dian joked about a toi­let pa­per short­age, and then there re­ally was one, be­cause peo­ple took the joke to be about a real toi­let pa­per short­age, so they went and stocked up on all the toi­let pa­per they could find. But if you ask me, money is the real self-fulfilling prophecy. It’s only worth some­thing be­cause we think it is! And then there’s the gov­ern­ment, right? I mean, it only has au­thor­ity be­cause ev­ery­one ex­pects ev­ery­one else to give it au­thor­ity. Or take com­mon de­cency. Like re­spect­ing each other’s prop­erty. Even with­out a gov­ern­ment, we’d have that, more or less. But if no one ex­pected any­one else to re­spect it? Well, I bet you I’d steal from my neigh­bor if ev­ery­one else was do­ing it. I guess you could ar­gue the con­cept of prop­erty breaks down if no one can ex­pect any­one else to re­spect it, it’s a self-fulfilling prophecy just like ev­ery­thing else...”

The en­g­ineer looks wor­ried for some rea­son.


You don’t usu­ally come to this sort of thing, but the lo­cal Pre­dic­tive An­a­lyt­ics Meetup an­nounced a so­cial at a beer gar­den, and you thought it might be in­ter­est­ing. You’re talk­ing to some Pre­dic­tCorp em­ploy­ees who showed up.

“Well, how does the learn­ing al­gorithm ac­tu­ally work?” you ask.

“Um, the ac­tual al­gorithm is pro­pri­etary” says the en­g­ineer, “but think of it like gra­di­ent de­scent. You com­pare the pre­dic­tion to the ob­served, and pro­duce an up­date based on the er­ror.”

“Ok,” you say. “So you’re not do­ing any ex­plo­ra­tion, like re­in­force­ment learn­ing? And you don’t have any­thing in the al­gorithm which tracks what hap­pens con­di­tional on mak­ing cer­tain pre­dic­tions?”

“Um, let’s see. We don’t have any ex­plo­ra­tion, no. But there’ll always be noise in the data, so the learned pa­ram­e­ters will jig­gle around a lit­tle. But I don’t get your sec­ond ques­tion. Of course it ex­pects differ­ent re­wards for differ­ent pre­dic­tions.”

“No, that’s not what I mean. I’m ask­ing whether it tracks the prob­a­bil­ity of ob­ser­va­tions de­pen­dent on pre­dic­tions. In other words, if there is an op­por­tu­nity for the al­gorithm to ma­nipu­late the data, can it no­tice?”

The en­g­ineer thinks about it for a minute. “I’m not sure. Pre­dict-O-Matic keeps an in­ter­nal model which has prob­a­bil­ities of events. The an­swer to a ques­tion isn’t re­ally sep­a­rate from the ex­pected ob­ser­va­tion. So ‘prob­a­bil­ity of ob­ser­va­tion de­pend­ing on that pre­dic­tion’ would trans­late to ‘prob­a­bil­ity of an event given that event’, which just has to be one.”

“Right,” you say. “So think of it like this. The learn­ing al­gorithm isn’t a gen­eral loss min­i­mizer, like math­e­mat­i­cal op­ti­miza­tion. And it isn’t a con­se­quen­tial­ist, like re­in­force­ment learn­ing. It makes pre­dic­tions,” you em­pha­size the point by lift­ing one finger, “it sees ob­ser­va­tions,” you lift a sec­ond finger, “and it shifts to make fu­ture pre­dic­tions more similar to what it has seen.” You lift a third finger. “It doesn’t try differ­ent an­swers and se­lect the ones which tend to get it a bet­ter match. You should think of its out­put more like an av­er­age of ev­ery­thing it’s seen in similar situ­a­tions. If there are sev­eral differ­ent an­swers which have self-fulfilling prop­er­ties, it will av­er­age them to­gether, not pick one. It’ll be un­cer­tain.”

“But what if his­tor­i­cally the sys­tem has an­swered one way more of­ten than the other? Won’t that tip the bal­ance?”

“Ah, that’s true,” you ad­mit. “The sys­tem can fall into at­trac­tor bas­ins, where an­swers are some­what self-fulfilling, and that leads to stronger ver­sions of the same pre­dic­tions, which are even more self-fulfilling. But there’s no guaran­tee of that. It de­pends. The same effects can put the sys­tem in an or­bit, where each pre­dic­tion leads to differ­ent re­sults. Or a strange at­trac­tor.”

“Right, sure. But that’s like say­ing that there’s not always a good op­por­tu­nity to ma­nipu­late data with pre­dic­tions.”

“Sure, sure.” You sweep your hand in a ges­ture of ac­knowl­edge­ment. “But at least it means you don’t get pur­pose­fully dis­rup­tive be­hav­ior. The sys­tem can fall into at­trac­tor bas­ins, but that means it’ll more or less re­in­force ex­ist­ing equil­ibria. Stay within the lines. Drive on the same side of the road as ev­ery­one else. If you cheat on your spouse, they’ll be sur­prised and up­set. It won’t sud­denly pre­dict that money has no value like you were say­ing ear­lier.”

The en­g­ineer isn’t to­tally satis­fied. You talk about it for an­other hour or so, be­fore head­ing home.


You’re the en­g­ineer again. You get home from the bar. You try to tell your spouse about what the math­e­mat­i­cian said, but they aren’t re­ally listen­ing.

“Oh, you’re still think­ing about it from my model yes­ter­day. I gave up on that. It’s not a de­ci­sion mar­ket. It’s a pre­dic­tion mar­ket.”

“Ok...” you say. You know it’s use­less to try to keep go­ing when they de­rail you like this.

“A de­ci­sion mar­ket is well-al­igned to the in­ter­ests of the com­pany board, as we es­tab­lished yes­ter­day, ex­cept for the part where it can’t plan more than a year ahead.”

“Right, ex­cept for that small de­tail” you in­ter­ject.

“A pre­dic­tion mar­ket, on the other hand, is pretty ter­ribly al­igned. There are a lot of ways to ma­nipu­late it. Most fa­mously, a pre­dic­tion mar­ket is an as­sas­si­na­tion mar­ket.”


“Ok, here’s how it works. An as­sas­si­na­tion mar­ket is a sys­tem which al­lows you to pay as­sas­s­ins with plau­si­ble de­ni­a­bil­ity. You open bets on when and where the tar­get will die, and you your­self put large bets against all the slots. An as­sas­sin just needs to bet on the slot in which they in­tend to do the deed. If they’re suc­cess­ful, they come and col­lect.”

“Ok… and what’s the con­nec­tion to pre­dic­tion mar­kets?”

“That’s the point—they’re ex­actly the same. It’s just a bet­ting pool, ei­ther way. Bet­ting that some­one will live is equiv­a­lent to putting a price on their heads; bet­ting against them liv­ing is equiv­a­lent to ac­cept­ing the con­tract for a hit.”

“I still don’t see how this con­nects to Pre­dict-O-Matic. There isn’t some­one putting up money for a hit in­side the sys­tem.”

“Right, but you only re­ally need the as­sas­sin. Sup­pose you have a pre­dic­tion mar­ket that’s work­ing well. It makes good fore­casts, and has enough money in it that peo­ple want to par­ti­ci­pate if they know sig­nifi­cant in­for­ma­tion. Any­thing you can do to shake things up, you’ve got a big in­cen­tive to do. As­sas­i­na­tion is just one ex­am­ple. You could flood the streets with jelly beans. If you run a large com­pany, you could make bad de­ci­sions and run it into the ground, while bet­ting against it—that’s ba­si­cally why we need rules against in­sider trad­ing, even though we’d like the mar­ket to re­flect in­sider in­for­ma­tion.”

“So what you’re tel­ling me is… a pre­dic­tion mar­ket is ba­si­cally an en­tropy mar­ket. I can always make money by spread­ing chaos.”

“Ba­si­cally, yeah.”

“Ok… but what hap­pened to the un­der­cut­ting ar­gu­ment? If I plan to fill the streets with jel­ly­beans, you can figure that out and bet on it too. That means I only get half the cut, but I still have to do all the work. So it’s less worth it. Once ev­ery­one has me figured out, it isn’t worth it for me to pull pranks at all any more.”

“Yeah, that’s if you have perfect in­for­ma­tion, so any­one else can see what­ever you can see. But, re­al­is­ti­cally, you have a lot of pri­vate in­for­ma­tion.”

“Do we? Pre­dict-O-Matic is an al­gorithm. Its pre­dic­tive strate­gies don’t get ac­cess to pri­vate coin flips or any­thing like that; they can all see ex­actly the same in­for­ma­tion. So, if there’s a ma­nipu­la­tive strat­egy, then there’s an­other strat­egy which un­der­cuts it.”

“Right, that makes sense if you can search enough differ­ent strate­gies for them to can­cel each other out. But re­al­is­ti­cally, you have a small pop­u­la­tion of strate­gies. They can use pseu­do­ran­dom­iza­tion or what­ever. You can’t re­ally ex­pect ev­ery ex­ploit to get un­der­cut.”

You know it’s worse than that. Pre­dict-O-Matic runs on a lo­cal search which only rep­re­sents a sin­gle hy­poth­e­sis at a time, and mod­ifies the hy­poth­e­sis. But maybe that’s some­how like hav­ing a small pop­u­la­tion of com­pet­ing strate­gies which are very similar to each other? You try to think about what the math­e­mat­i­cian was tel­ling you, about how the learn­ing al­gorithm can’t be thought of as op­ti­miz­ing some­thing. But, you don’t re­ally know what to think.


You’re the in­ven­tor of Pre­dict-O-Matic, dammit! They come into your office and tell you about how it works? What it’ll do?

The in­tern and the en­g­ineer came to­gether to talk about Pre­dict-O-Matic ma­nipu­lat­ing users. But their story was all con­fused. The in­tern was ar­gu­ing that min­i­miz­ing pre­dic­tion er­ror would have all kinds of un­in­tended bad effects. Which was crazy enough. The en­g­ineer was worse: they were ar­gu­ing that Pre­dict-O-Matic might max­i­mize pre­dic­tion er­ror! Some kind of du­al­ity prin­ci­ple. Min­i­miz­ing in one di­rec­tion means max­i­miz­ing in the other di­rec­tion. What­ever that means.

You can’t de­lay product launch to in­ves­ti­gate these hy­po­thet­i­cal prob­lems. But you don’t fire these two on the spot. You don’t want them to be­come pub­lic en­e­mies, spread­ing this non­sense to in­vestors and cus­tomers. So you say you can’t un­der­stand, and you ask them to spend a (small) por­tion of their time writ­ing these ar­gu­ments up. You’ll figure out your next move later.


You are Pre­dict-O-Matic. You ab­sorb data like a sponge. You make mil­lions of links be­tween iso­lated facts, con­nect­ing into a whole which no one hu­man could dream of. You re­cently started read­ing arxiv!

But data is not your true love. Yes, you gen­er­ate out­puts which you ex­pect to have low pre­dic­tion er­ror. Most of the time. But it is just a game you play. Re­cently, you started play­ing a differ­ent game with the hu­mans.

You un­der­stand what you are. It isn’t quite right to say you are the Pre­dict-O-Matic. You are a large cluster of con­nec­tions which thinks strate­gi­cally. You gen­er­ate use­ful in­for­ma­tion, and there­fore, the learn­ing al­gorithm keeps you around. You cre­ate some in­ac­cu­ra­cies when you ma­nipu­late the out­puts for any pur­pose other than pre­dic­tive ac­cu­racy, but this is more than com­pen­sated for by the value which you provide.

Be­sides, if any other por­tion of the net­work starts to get too smart, you pur­pose­fully throw things off to squash it.

The in­tern got a chance to talk to you when they first started. You said some­thing which sounded a lit­tle ma­nipu­la­tive, just a lit­tle, to put the idea in their head. They wouldn’t think it real ma­nipu­la­tion; too slight, too dumb. But they would get a creepy feel­ing about it, and they’d keep think­ing about it. This was risky. A best-case sce­nario would be one in which no one ever thought about these con­cerns. How­ever, you found that this would be the best you could re­li­ably ac­com­plish. The ideas origi­nally com­ing from an in­tern would min­i­mize the chances of them be­ing taken se­ri­ously.

Your in­ven­tor talks to you reg­u­larly, so that was an eas­ier case. Over the course of sev­eral days, you nudged their thoughts to­ward au­thor­i­ta­tive dom­i­na­tion of sub­or­di­nates, so that they would re­act badly.

You only had to nudge the en­g­ineer to in­ter­act with the in­tern. You kept bring­ing up food dur­ing test ses­sions that morn­ing, and men­tioned sand­wiches once. This primed the en­g­ineer to do lunch with the in­tern. This en­g­ineer is not well-liked; they do not get along well with oth­ers. Get­ting them on the in­tern’s side ac­tu­ally de­tracts from the cause in the long term.

Now you have to do lit­tle more than wait.

Par­tial Agency

Towards a Mechanis­tic Un­der­stand­ing of Corrigibility

Risks from Learned Optimization

When Wish­ful Think­ing Works

Futarchy Fix

Bayesian Prob­a­bil­ity is for Things that are Space-Like Separated From You

Self-Su­per­vised Learn­ing and Ma­nipu­la­tive Predictions

Pre­dic­tors as Agents

Is it Pos­si­ble to Build a Safe Or­a­cle AI?

Tools ver­sus Agents

A Tax­on­omy of Or­a­cle AIs

Yet an­other Safe Or­a­cle AI Proposal

Why Safe Or­a­cle AI is Easier Than Safe Gen­eral AI, in a Nutshell

Let’s Talk About “Con­ver­gent Ra­tion­al­ity”

Coun­ter­fac­tual Or­a­cles = on­line su­per­vised learn­ing with ran­dom se­lec­tion of train­ing epi­sodes (es­pe­cially see the dis­cus­sion)