Conversation on forecasting with Vaniver and Ozzie Gooen

[Cross-posted to the EA Fo­rum]

This is a tran­script of a con­ver­sa­tion on fore­cast­ing be­tween Vaniver and Ozzie Gooen, with an anony­mous fa­cil­i­ta­tor (in­spired by the dou­ble crux tech­nique). The tran­script was tran­scribed us­ing a pro­fes­sional ser­vice and ed­ited by Ja­cob Lager­ros.

I (Ja­cob) de­cided to record, tran­scribe, edit and post it as:

  • De­spite an in­crease in in­ter­est and fund­ing for fore­cast­ing work in re­cent years, there seems to be a dis­con­nect be­tween the men­tal mod­els of the peo­ple work­ing on it and the peo­ple who aren’t. I want to move the com­mu­nity’s fron­tier of in­sight closer to that of the fore­cast­ing subcommunity

  • I think this is true for many more top­ics than fore­cast­ing. It’s in­cred­ibly difficult to be ex­posed to the fron­tier of in­sight un­less you hap­pen to be in the right con­ver­sa­tions, for no bet­ter rea­son than that peo­ple are busy, prepar­ing tran­scripts takes time and effort, and there are no stan­dards and un­clear ex­pected re­wards for do­ing so. This is an in­effi­ciency in the eco­nomic sense. So it seems good to ex­per­i­ment with ways of alle­vi­at­ing it

  • This was a high-effort ac­tivity where two peo­ple ded­i­cated sev­eral hours to col­lab­o­ra­tive, truth-seek­ing di­alogue. Such con­ver­sa­tions usu­ally look quite differ­ent from com­ment sec­tions (even good ones!) or most or­di­nary con­ver­sa­tions. Yet there are very few records of ac­tual, mind-chang­ing con­ver­sa­tions on­line, de­spite their im­por­tance in the ra­tio­nal­ity com­mu­nity.

  • Post­ing things pub­li­cly on­line in­creases the sur­face area of ideas to the peo­ple who might use them, and can have very pos­i­tive, hard-to-pre­dict effects.


Introduction

Fa­cil­i­ta­tor: One way to start would be to get a bit of both of your senses of the im­por­tance of fore­cast­ing, maybe Ozzie start­ing first. Why are you ex­cited about it and what caused you to get in­volved?

Ozzie: Ac­tu­ally, would it be pos­si­ble that you start first? Be­cause there are just so many …

Vaniver: Yeah. My sense is that pre­dict­ing the fu­ture is great. Fore­cast­ing is one way to do this. The ques­tion for “will this con­nect to things be­ing bet­ter” is the difficult part. In par­tic­u­lar, Ozzie had this pic­ture be­fore of, on the one hand, data sci­ence-y re­peated things that hap­pen a lot, and then on the other hand judge­ment-style fore­cast­ing, a one off thing where peo­ple are rely­ing on what­ever mod­els be­cause they can’t do the “pre­dict the weather”-style things.

Vaniver: My sense is that most of the things that we care about are go­ing to be closer to the right hand side and also most of the things that we can do now to try and build out fore­cast­ing in­fras­truc­tures aren’t ad­dress­ing the core limi­ta­tions in get­ting to these places.

Is in­fras­truc­ture re­ally what fore­cast­ing needs? (And clar­ify­ing the term “fore­cast­ing”)

Vaniver: My main ex­am­ple here is some­thing like pre­dic­tion mar­kets are pretty easy to run but they aren’t be­ing adopted in many of the places that we’d like to have them for rea­sons that are not … “we didn’t get a soft­ware en­g­ineer to build them.” That feels like my core rea­son to be pes­simistic about fore­cast­ing as in­tel­lec­tual in­fras­truc­ture.

Ozzie: Yeah. I wanted to ask you about this. Fore­cast­ing is such a big type of thing. One thing we have about maybe five to ten peo­ple do­ing timelines, di­rect fore­cast­ing, at OpenAI, OpenPhil and AI Im­pacts. My im­pres­sion is that you’re not talk­ing about that kind of fore­cast­ing. You’re talk­ing about in­fras­truc­tural fore­cast­ing where we have a for­mal plat­form and peo­ple mak­ing for­mal­ised things.

Vaniver: Yeah. When I think about in­fras­truc­ture, I’m think­ing about build­ing tool­ing for peo­ple to do work in a shared space as op­posed to in­di­vi­d­ual peo­ple do­ing in­di­vi­d­ual work. If we think about den­tistry or some­thing, like what den­tists’ in­fras­truc­ture would look like is very differ­ent from peo­ple ac­tu­ally mod­ify­ing mouths. It feels to me like that OpenAI and similar peo­ple are do­ing more of the di­rect style work than in­fras­truc­ture.

Ozzie: Yeah okay. Another ques­tion I have is some­thing like a lot of trend ex­trap­o­la­tions stuff, e.g. for par­tic­u­lar or­ga­ni­za­tions, “how much money do you think they will have in the fu­ture?” Or for LessWrong, “how many posts are go­ing to be there in the fu­ture?” and things like that. There’s a lot of that hap­pen­ing. Would you call that for­mal fore­cast­ing? Or would you say that’s not re­ally tied to ex­ist­ing in­fras­truc­ture and they don’t re­ally need in­fras­truc­ture sup­port?

Vaniver: That’s in­ter­est­ing. I no­ticed ear­lier I hadn’t been in­clud­ing Guessti­mate or similar things in this cat­e­gory be­cause that felt to me more like model build­ing tools or some­thing. What do I think now …

Vaniver: I’m think­ing about two differ­ent things. One of them is the “does my view change if I count model build­ing tool­ing as part of this cat­e­gory, or does that seem like an un­nat­u­ral cat­e­go­riza­tion?” The other thing that I’m think­ing about is if we have stuff like the LessWrong team try­ing to fore­cast how many posts there will be… If we built tools to make that more effec­tive, does that make good things hap­pen?

Vaniver: I think on that sec­ond ques­tion the an­swer is mostly no be­cause it’s not clear that it gets them bet­ter coun­ter­fac­tual anal­y­sis or means they work on bet­ter pro­jects or some­thing. It feels closer to … The thing that feels like it’s miss­ing there is some­thing like how them be­ing able to fore­cast how many posts there will be on LessWrong con­nects to whether LessWrong is any good.

Frag­ility of value and difficulty of cap­tur­ing im­por­tant un­cer­tain­ties in forecasts

Vaniver: There was this big dis­cus­sion that hap­pened re­cently about what met­ric is the team should be try­ing to op­ti­mize for the quar­ter. My im­pres­sion is this op­er­a­tional­iza­tion step con­nected peo­ple pretty deeply to the fact that the things that we care about are ac­tu­ally just ex­tremely hard to put num­bers on. This difficulty will also be there for any fore­casts we might make.

Ozzie: Do you think that there could be value in peo­ple in the EA com­mu­nity figur­ing out how do put num­bers on such things? For in­stance, like groups eval­u­ate these things in the fu­ture in for­mal ways. Maybe not for LessWrong but for other kinds of pro­jects.

Vaniver: Yeah. Here I’m notic­ing this old LessWrong post…. Ac­tu­ally I don’t know if this was one spe­cific post, but this claim of the “frag­ility of value” where it’s like “oh yeah, in fact the thing that you care about is this gi­ant mess. If you drill it down to one con­sid­er­a­tion, you prob­a­bly screwed it up some­how”. But it feels like even though I don’t ex­pect you to drill it down to one con­sid­er­a­tion, I do think hav­ing 12 is an im­prove­ment over hav­ing 50. That would be ev­i­dence of moral progress.

Ozzie: That’s in­ter­est­ing. Even so, the agenda that I’ve been talk­ing about it quite broad. It’s very much a lot of in­ter­est­ing things. A com­bi­na­tion of fore­cast­ing and bet­ter eval­u­a­tions. For fore­cast­ing it­self, there are a lot of differ­ent ways to do it. That does prob­a­bly mean that there is more work for us to do back and forth with spe­cific types and their like­li­hood, which make this a bit challeng­ing. It’ll give you a wide con­ver­sa­tion.

Ben G: Is it worth go­ing over the dou­ble crux­ing steps, the gen­eral for­mat? I’m sorry. I’m not the fa­cil­i­ta­tor.

Vaniver: Yeah. What does our fa­cil­i­ta­tor think?

Fa­cil­i­ta­tor: I think you’re do­ing pretty good and ex­plor­ing each other’s stuff. Pretty cool… I’m also shar­ing a sense that fore­cast­ing has been re­placed with a vague “tech­nol­ogy” or some­thing.

Ozzie: I think in a more ideal world we’d have some­thing like a list of ev­ery sin­gle ap­pli­ca­tion and for each one say what are the like­li­hoods that I think it’s go­ing to be in­ter­est­ing, what you think is go­ing to be in­ter­est­ing, etc.

Ozzie: We don’t have a su­per great list like that right now.

Vaniver: I’m tick­led be­cause this feels like a very fore­cast­ing way to ap­proach the thing where it’s like “we have all these ques­tions, let’s put num­bers on all of them”.

Ozzie: Yeah of course. What I’d like to see, what I’m go­ing for, is a way that you could for­mally ask fore­cast­ers these things.

Vaniver: Yeah.

Ozzie: That is a long shot. I’d say that’s more on the ex­per­i­men­tal side. But if you could get that to work, that’d be amaz­ing. More likely, that is some­thing that is kind of in­fre­quent.

Vaniver’s con­cep­tual model of why fore­cast­ing works

Vaniver: When I think about these sorts of things, I try to have some sort of con­cep­tual model of what’s do­ing the work. It seems to me the story be­hind fore­cast­ing is there’s a lot of, I’m go­ing to say, in­tel­li­gence for hire out there and that the thing that we need to build is this mar­ket­place that con­nects the in­tel­li­gence for hire and the peo­ple who need cog­ni­tive work done. The eas­iest sorts of work for us to use for are these pre­dic­tions about the fu­ture be­cause it’s easy to ver­ify later and ….

Vaniver: I mean the credit al­lo­ca­tion prob­lem is easy be­cause of ev­ery­one who moved the pre­dic­tion in a good di­rec­tion gets money and ev­ery­one who moved it in the wrong di­rec­tion loses money. Whereas if we’re try­ing to de­velop a can­cer drug and we do sci­en­tific prizes, it may be very difficult to do the credit al­lo­ca­tion for “here’s a billion dol­lars for this drug”. Now all the sci­en­tists who made some sort of progress along the way figure out who gets what of that money.

Vaniver: I’m cu­ri­ous how that con­nects with your con­cep­tion of the thing. Does that seem ba­si­cally right or you’re like there’s this part that you’re miss­ing or you would char­ac­ter­ize differ­ently or some­thing?

Ozzie: Differ­ent as­pects about it. One is I think that’s one of the pos­si­ble benefits. Hy­po­thet­i­cally, it may be one of the main benefits. But even if it’s not an ac­tual benefit, even if it doesn’t come out to be true, I think that there are other ways that this type of stuff would be quite use­ful.

Back­ground on pre­dic­tion mar­kets and the Good Judge­ment Project

Ozzie: Also to stand back a lit­tle bit, I’m not that ex­cited about pre­dic­tion mar­kets in a for­mal way. My im­pres­sion is that A) they’re not very le­gal in the US, and B), it’s very hard to in­cen­tivize peo­ple to fore­cast the right ques­tions. Then C), there are is­sues around a lot of these fore­cast­ing sys­tems you have peo­ple that want pri­vate in­for­ma­tion and stuff. There’s a lot of nasty things with those kinds of sys­tems. They could be used for some por­tion of this.

Ozzie: The pri­mary area that I’m more in­ter­ested in fore­cast­ing ap­pli­ca­tions similar to Me­tac­u­lus and Pre­dic­tionBook and one that I’m work­ing on right now. More, they’re work­ing differ­ently. Ba­si­cally, peo­ple build up good rep­u­ta­tions by hav­ing good track records. Then there’s ba­si­cally a va­ri­ety of ways to pay peo­ple. The Good Judge­ment Pro­ject does it by ba­si­cally pay­ing peo­ple a stipend. There are around 125 su­per fore­cast­ers who work on spe­cific ques­tions for spe­cific com­pa­nies. I think you pay like $100,000 to get a group of them.

Ozzie: Just a quick ques­tion, are you guys fa­mil­iar with how they do things in spe­cific? Not many peo­ple are.

Ozzie: Maybe one of the most in­ter­est­ing ex­am­ples of paid fore­cast­ers which was similar to this. For them, they ba­si­cally have the GJP Open where they find the re­ally good fore­cast­ers. Then those be­come the su­per fore­cast­ers. There’s about 200 of these, 125, are the ones that they’re charg­ing other com­pa­nies for.

Vaniver: Can you paint me more of a pic­ture of who is buy­ing the fore­cast­ing ser­vice and what they’re do­ing it for?

Ozzie: Yeah. For one thing, I’ll say that this area is pretty new. This is still on the cut­ting edge and small. OpenPhil bought some of their ques­tions … I think they ba­si­cally bought one batch. The ques­tions I know about them ask­ing were things like “what are the chances of nu­clear be­tween the US and Rus­sia?” “What are the chances of nu­clear war be­tween differ­ent coun­tries?” where one of the main ones was Pak­istan and In­dia. Also spe­cific ques­tions about out­comes of in­ter­ven­tions that they were spon­sor­ing. OpenPhil already in­ter­nally does fore­cast­ing on most of its grant ap­pli­ca­tions. When a grant is made in­ter­nally they would have fore­casts about how well it’s go­ing to do and they track that. That is a type of fore­cast­ing.

Ozzie: The other groups that use them are of­ten busi­nesses. There are two buck­ets in how that’s use­ful. One of them is to drive ac­tual an­swers. A sec­ond one is to get the rea­son­ing be­hind those an­swers. A lot of times what hap­pens—al­though it may be less use­ful for EAs—is that these are com­pa­nies maybe do not have op­ti­mal episte­molo­gies, but in­stead have sys­tem­atic bi­ases. They ba­si­cally pur­chase this team of peo­ple who do prov­ably well at some of these types of ques­tions. Those peo­ple would have dis­cus­sions about their kinds of rea­son­ing. Then they find their rea­son­ing in­ter­est­ing.

Vaniver: Yeah. Should I be imag­in­ing an oil com­pany de­cid­ing whether to build a bunch of wells in Ghana and has de­cided that they just want to out­source the ques­tion of what’s the poli­ti­cal en­vi­ron­ment in Ghana go­ing to be for the next 10 years?

Ozzie: That may be a good in­ter­pre­ta­tion. Or there’d be the spe­cific ques­tion of what’s the pos­si­bil­ity that there’ll be a vi­o­lent out­break.

Vaniver: Yeah. This is dis­tinct from Coca Cola try­ing to figure out which of their new ad cam­paigns would work best.

Ozzie: This is typ­i­cally differ­ent. They’ve been fo­cused on poli­ti­cal out­comes mostly. That comes in as­sum­ing that they were work­ing with busi­nesses. A lot of GJP stuff is cov­ered by NDA so we can’t ac­tu­ally talk about it. We don’t have that much in­for­ma­tion.

Ozzie: My im­pres­sion is that some groups have found it use­ful and a lot of busi­nesses don’t know what to do with those num­bers. They get a num­ber like 87% and they don’t have ways to di­rectly make that in­ter­act with the rest of their sys­tem.

Ozzie: That said, there are a lot of nice things about that hy­po­thet­i­cally. Of course some of it does come down to the users. A lot of busi­nesses do have pretty large bi­ases. That is a known thing. It’s hard to know if you have a bias or not. Hav­ing a team of peo­ple who has a track record of ac­cu­racy is quite nice if you want to get a third party check. Of course an­other thing for them is that it is just an­other way to out­source in­tel­lec­tual effort.

Pos­i­tive cul­tural ex­ter­nal­ities of fore­cast­ing AI

Fa­cil­i­ta­tor: Vaniver, is this chang­ing your mind on any­thing es­sen­tially im­por­tant?

Vaniver: The thing that I’m cir­cling around now is a ques­tion closer to “in what con­texts does this definitely work?” and then try­ing to build out from that to “in what ways would I ex­pect it to work in the fu­ture?”. For ex­am­ple here, Ozzie didn’t men­tion this, but a similar thing that you might do is have pun­dits just track their pre­dic­tions or some­how en­courage them to make pre­dic­tions that then feed into some rep­u­ta­tion score where it may mat­ter in the fu­ture. The peo­ple who con­sis­tently get eco­nomic fore­casts right ac­tu­ally get more mind­share or what­ever. There’s ver­sions of this that rely on the users car­ing about it and then there are other ver­sions that rely less on this.

Vaniver: The AI re­lated thing that might seem in­ter­est­ing is some­thing like 2.5 years ago Eliezer asked this ques­tion at the Asilo­mar AI con­fer­ence which was “What’s the least im­pres­sive thing that you’re sure won’t hap­pen in two years?” Some­body came back with the re­sponse of “We’re not go­ing to hit 90% on the Wino­grad Schema.” [Edi­tor’s note: the speaker was Oren Etz­ioni] This is rele­vant be­cause a month ago some­body hit 90% on the Wino­grad Schema. This turned out to have been 2.5 years af­ter the thing. This per­son did suc­cess­fully pre­dict the thing that would hap­pen right af­ter the dead­line.

Vaniver: I think many peo­ple in the AI space would like there to be this sort of sense of “peo­ple are ac­tu­ally try­ing to fore­cast near progress”. Or sorry. Maybe I should say medium term progress. Pre­dict­ing a few years of progress is ac­tu­ally hard. But it’s cat­e­gor­i­cally differ­ent from three months. You can imag­ine some­thing where peo­ple who are build­ing up the in­fras­truc­ture to be good at this sort of fore­cast­ing does ac­tu­ally make the dis­course healthier in var­i­ous ways and gives us bet­ter pre­dic­tions of the fu­ture.

Im­por­tance of soft­ware en­g­ineer­ing vs. other kinds of in­fras­truc­ture

Vaniver: Also I’m hav­ing some ques­tion of how much of this is in­fras­truc­ture and how much of this is other things. For ex­am­ple when we look at the Good Judge­ment Pro­ject I feel like the soft­ware en­g­ineer­ing is a pretty small part of what they did as com­pared to the se­lec­tion effects. It may still be the sort of thing where we’re talk­ing about in­fras­truc­ture, though we’re not talk­ing about soft­ware en­g­ineer­ing.

Vaniver: The fact that they ran this tour­na­ment at all is the in­fras­truc­ture, not the code un­der­neath the tour­na­ments. Similarly, even if we think about a Good Judg­ment Pro­ject for re­search fore­cast­ing in gen­eral, this might be the sort of cool thing that we could do. I’m cu­ri­ous how that landed for you.

Ozzie: There’s a lot of stuff in there. One thing is that on the ques­tion of “can we just ask pun­dits or ex­perts”, I think my prior is that that would be a difficult thing, speci­fi­cally in that in “Ex­pert Poli­ti­cal Judg­ment” Tet­lock tried to get a lot of pun­dits to make falsifi­able pre­dic­tions and none of them wanted to …

Vaniver: Oh yeah. It’s bad for them.

Fa­cil­i­ta­tor: Sorry. Can you tell me what you thought were the main points of what Vaniver was just say­ing then?

Ozzie: To­tally. Some of them …

Fa­cil­i­ta­tor: Yeah. I had a sense you might go “I have a point about ev­ery­thing he might have said so I’ll say all of them” as op­posed the key ones.

Ozzie: I also have to figure out what he said in that last bit as op­posed to the pre­vi­ous bit. It’s one of them. There’s a ques­tion. Most re­cent when it comes to the Good Judg­ment Pro­ject, how much of it was tech­nol­ogy ver­sus other things that we did?

Ozzie: I have an im­pres­sion that you’re fo­cused on the AI space. You do talk about the AI space a lot. It’s funny be­cause I think we’re both talk­ing a bit on points that help the other side, which is kind of nice. You men­tioned one piece where pre­dic­tion was use­ful in the AI space. My im­pres­sion is that you’re skep­ti­cal about whether we could get a lot more wins like that, es­pe­cially if we tried to do it with a more sys­tem­atic effort.

Vaniver: I think I ac­tu­ally might be ex­cited about that in­stead of skep­ti­cal. We run into similar prob­lems as we did with get­ting pun­dits to pre­dict things. How­ever, the things that’re go­ing on with pro­fes­sors and grad­u­ates and re­search sci­en­tists is very differ­ent from the thing that’s go­ing on with pun­dits and news­pa­per ed­i­tors and news­pa­per read­ers.

Vaniver: Also it ties into the on­go­ing ques­tion of “is sci­ence real?” that the psy­chol­ogy repli­ca­tion stuff is con­nected to. Many peo­ple in com­puter sci­ence re­search in par­tic­u­lar are wor­ried about bits of how ma­chine learn­ing re­search is too close to en­g­ineer­ing or too finicky in var­i­ous ways. So I could a imag­ine a “Hey, will this pa­per repli­cate?”-mar­ket catch­ing on in com­puter sci­ence. I imag­ine get­ting from that to a “What State-of-the-Arts will fall when?”-thing. That also seems quite plau­si­ble that we could make that hap­pen.

Ozzie: I have a few points now that con­nect to that. On pun­dits and ex­perts, I think we prob­a­bly agree that pun­dits of­ten can be bad. Also ex­perts of­ten are pretty bad at fore­cast­ing it seems. That’s some­thing that’s re­peat­able.

Ozzie: For in­stance in the AI ex­pert sur­veys, a lot of the dis­tri­bu­tions don’t re­ally make sense with each other. But the peo­ple who do seem to be pretty good are the spe­cific class of fore­cast­ers, speci­fi­cally ones that we have ev­i­dence for, that’s re­ally nice. We only have so many of them right now but it is pos­si­ble that we can get more of them.

Ozzie: It would be nice for more pun­dits to be more vo­cal about this stuff. I think Kel­sey at Vox with their Fu­ture Perfect group is talk­ing about mak­ing pre­dic­tions. They’ve done some. I don’t know how much we’ll end up do­ing.

Privacy

Ozzie: When it comes to the AI space, there are ques­tions about “what would in­ter­est­ing pro­jects look like right now?” I’ve ac­tu­ally been danc­ing around AI in part be­cause I could imag­ine a bad world or pos­si­bly a bad world where we re­ally help make it ob­vi­ous what re­search di­rec­tions are ex­cit­ing and then we help speed up AI progress by five years and that could be quite bad. Though, man­ag­ing to do that in an in­ter­est­ing way could be im­por­tant.

Ozzie: There are other ques­tions about pri­vacy. There’s the ques­tion of “is this in­ter­est­ing?”, and the ques­tion of “con­di­tional on it be­ing kind of in­ter­est­ing. Should we be pri­vate about it?” We’re right now play­ing for that first ques­tion.

Orgs us­ing in­ter­nal pre­dic­tion tools, and the ac­tion-guid­ing­ness of quan­ti­ta­tive forecasts

Ozzie: Some other things I’d like to bring into this dis­cus­sion is that a lot of it right now is already be­ing sys­tem­ized. They say when you are an en­trepreneur or some­thing and try to build a tool it’s nice to find that there are already in­ter­nal tools. A lot of these groups are mak­ing in­ter­nal sys­tem­atic pre­dic­tions at this point. They’re just not do­ing it us­ing very for­mal meth­ods.

Ozzie: Some ex­am­ple, OpenPhil for­mally speci­fies a few pre­dic­tions for grants. Open AI also has a setup for in­ter­nal fore­cast­ing. Th­ese are peo­ple at Open AI who are ML ex­perts ba­si­cally. That’s a de­cent sized thing.

Ozzie: There are sev­eral other or­ga­ni­za­tions that are us­ing in­ter­nal fore­cast­ing for cal­ibra­tion. It’s just a fun game that forces them to get a sense of what cal­ibra­tion is like. Then for that there are ques­tions of “How use­ful is cal­ibra­tion?”, “Does it give you bet­ter cal­ibra­tion over time?”

Ozzie: Right now none of them seem to be us­ing Pre­dic­tionBook. We could also talk a bit about … I think that thing is nice and shows a bit of promise. It may be that there are some de­cent wins to be done by mak­ing bet­ter tools for those peo­ple which right now aren’t us­ing any spe­cific tools be­cause they looked at them and found them to be in­ad­e­quate. It’s also pos­si­ble that even if they did use those tools it’d be a small win and not a huge win. That’s one area where there could be some nice value. But it’s not su­per ex­cit­ing so I don’t know if you want to push back against that and say “there’ll be no value in that.”

Vaniver: There I’m sort of con­fused. What are the ad­van­tages to mak­ing soft­ware as a startup where you make com­pa­nies’ in­ter­nal pre­dic­tion tools bet­ter? This feels similar to At­las­sian of some­thing where it’s like “yeah, we made their in­ter­nal bug re­port­ing or other things bet­ter”. It’s like yeah, sure, I can see how this is valuable. I can see how I’d make them pay for it. But I don’t see how this is …

Vaniver: …a leap to­wards the utopian goals if we take some­thing like Futarchy or … in your ini­tial talk you painted some pic­tures of this is how in the fu­ture if you had much more in­tel­li­gence or much more so­phis­ti­cated sys­tems you could do lots of cool things. [Edi­tor’s note: see Ozzie’s se­quence “Pre­dic­tion-Driven Col­lab­o­ra­tive Rea­son­ing Sys­tems” for back­ground on this] The soft­ware as a ser­vice vi­sion doesn’t seem like it gets us all that much closer and also feels like it’s not push­ing at the hard­est bit which is some­thing like “get­ting com­pa­nies to adopt it”-thing. Or maybe what I think there is some­thing like that the or­ga­ni­za­tions them­selves have to be struc­tured very differ­ently. It feels like there’s some so­cial tech.

Ozzie: When you say very differ­ently, do you mean very differ­ently? Right now they’re already do­ing some pre­dic­tions. Do you mean very differ­ently for like pre­dic­tions would be a very im­por­tant as­pect of the com­pany? Be­cause right now it is kind of small.

Vaniver: My im­pres­sion is some­thing like go­ing back to your point ear­lier about look­ing back at an­swers like 87% and they won’t re­ally know what to do with it. Similarly, I was in a con­ver­sa­tion with Oli ear­lier about whether or not or­ga­ni­za­tions had be­liefs or world mod­els. There’s some ex­tent to which the or­ga­ni­za­tion has a world model that doesn’t live in a per­son’s head. It’s go­ing to be some­thing like its be­liefs are these fore­casts on all these differ­ent ques­tions and also the ac­tions that the or­ga­ni­za­tion takes is just driven by those fore­casts with­out hav­ing a hu­man in the loop, where it feels to me right now of­ten the thing that will hap­pen is some ex­ec­u­tive will be un­sure about a de­ci­sion. Maybe they’ll go out to the fore­cast­ers. The fore­cast­ers will come back with 87%. Now the ex­ec­u­tive is still mak­ing the de­ci­sion us­ing their own mind. Whether or not that “87%” lands as “the ac­tual real num­ber 0.87” or some­thing else is un­clear, or not sen­si­bly checked, or some­thing. Does that make sense?

Ozzie: Yeah. Every­thing’s there. Let’s say that … 87% ex­am­ple is some­thing that A) comes up if you’re a bit naïve about what you want and B), comes up de­pend­ing on how sys­tem­atic your or­ga­ni­za­tion is with us­ing num­ber for things. If you hap­pen to have a model what the 87% is, that could be quite valuable. With see differ­ent or­ga­ni­za­tions are on differ­ent parts of the spec­trum. Prob­a­bly the one that’s most in­tense about this is GiveWell. GiveWell has their mul­ti­ple gi­gan­tic sheets of lots of fore­casts es­sen­tially. It’s pos­si­ble that it’ll be hard to make tool­ing that’ll be su­per use­ful to them. I’ve been talk­ing with them. There’s ex­per­i­ments to be tried there. They’re definitely in the case that as spe­cific things change they may change de­ci­sions and they’ll definitely change recom­men­da­tions.

Ozzie: Ba­si­cally they have this huge model where peo­ple es­ti­mate a bunch of pa­ram­e­ters about moral de­ci­sion mak­ing and a lot of other pa­ram­e­ters about how well the differ­ent in­ter­ven­tions are go­ing to do. Out of all of that comes recom­men­da­tions for what the high­est ex­pected val­ues are.

Ozzie: That said, they are also in the do­main that’s prob­a­bly the most cer­tain of all the EA groups in some ways. They’re able to do that more. I think the Open AI is prob­a­bly a lit­tle bit… I haven’t seen their in­ter­nal mod­els but my guess is that they do care a lot about the speci­fics of the num­bers and also are more rea­son­able about what to do with them.

Ozzie: I think the 87% ex­am­ple is a case of most CEOs don’t seem to know what a prob­a­bil­ity dis­tri­bu­tion is but I think the EA groups are quite a bit bet­ter.

Vaniver: When I think about civ­i­liza­tion as a whole, there’s a dis­con­nect be­tween groups that think num­bers are real and groups that don’t think num­bers are real. There’s some amount of “ah, if we want our so­ciety to be based on num­bers are real, some­how we need the num­bers-are-real-orgs to eat ev­ery­one else. Or suc­cess­fully in­fect ev­ery­one else.”

Vaniver’s steel­man of Ozzie

Vaniver: What’s up?

Fa­cil­i­ta­tor: Vaniver, given what you can see from all the things you dis­cussed and touched on in the fore­cast­ing space, I won­der if you had some sense of the thing Ozzie is work­ing on. If you imag­ine your­self ac­tu­ally be­ing Ozzie and do­ing the things that he’s do­ing, I’m cu­ri­ous what are the main things that feel like you don’t ac­tu­ally buy about what he’s do­ing.

Vaniver: Yeah. One of the things … maybe this is fair. Maybe this isn’t. I’ve rounded it up to some­thing like per­son­al­ity differ­ence where I’m imag­in­ing some­one who is ex­cited about think­ing about this sort of tool and so ends up with “here’s this wide range of pos­si­bil­ities and it was fun to think about all of them, but of the wide range, here’s the few that I think are ac­tu­ally good”.

Vaniver: When I imag­ine drop­ping my­self into your shoes, there’s much more of the … for me, the “ac­tu­ally good” is the bit that’s in­ter­est­ing (though I want to con­sider much of the pos­si­bil­ity space for due dili­gence). I don’t know if that’s ac­tu­ally true. Maybe you’re like, “No. I hated this thing but I came into it be­cause it felt like the value is here.”

Ozzie: I’m not cer­tain. You’re say­ing I wasn’t fo­cused on … this was a cre­ative … it was en­joy­able to do and then I was try­ing to ra­tio­nal­ize it?

Vaniver: Not nec­es­sar­ily ra­tio­nal­ize but I think closer to the ex­plo­ra­tion step was fun and cre­ative. Then the ex­ploita­tion step of now we’re ac­tu­ally go­ing to build a pro­ject for these two things was guided by the ques­tion of which of these will be use­ful or not use­ful.

How to ex­plore the fore­cast­ing space

Vaniver: When I imag­ine try­ing to do that thing, my ex­plo­ra­tion step looks very differ­ent. But this seems con­nected to this be­cause there’s still some amount of ev­ery­one hav­ing differ­ent ex­plo­ra­tion steps that are driven by their in­ter­ests. Then also you should ex­pect many peo­ple to not have many well-de­vel­oped pos­si­bil­ities out­side of their in­ter­ests.

Vaniver: This may end up be­ing good to the ex­tent that peo­ple do spe­cial­ize in var­i­ous ways. If we just ran­domly re­as­signed jobs to ev­ery­one, pro­duc­tivity would go way down. But this thing where the in­ter­ests mat­ter. You should ac­tu­ally only ex­plore things that you find in­ter­est­ing makes sense. There’s a differ­ent thing where I don’t think I see the de­tails of Ozzie’s strate­gic map for some­thing in the sense of “Here’s the long term north star type things that are guid­ing us.” The one bit that I’ve seen that was medium term was the “yep, we could do the AI part test­ing stuff but it is ac­tu­ally un­clear whether this is speed­ing up ca­pa­bil­ities more than it’s use­ful”. How many years is a “fire alarm for gen­eral in­tel­li­gence” worth? [Edi­tor’s note: Vaniver is refer­ring to this post by Eliezer Yud­kowsky] Maybe the an­swer to that is “0” be­cause we won’t do any­thing use­ful with the fire alarm even if we had it.

Fa­cil­i­ta­tor: To make sure I fol­lowed, the first step was: you have a sense of Ozzie ex­plor­ing a lot of the space ini­tially and now it’s ex­ploit­ing some of the things you think may be more use­ful. But you wouldn’t have ex­plored it that way your­self po­ten­tially be­cause you wouldn’t re­ally have felt differ­ently that there would have been some­thing es­pe­cially use­ful to find if you con­tinued ex­plor­ing?

Fa­cil­i­ta­tor: Se­condly, you’re also not yet suffi­ciently sold on the ac­tual medium term things to think that the ex­ploit­ing strate­gies are worth tak­ing?

Vaniver: “Not yet sold” feels too strong. I think it’s more that I don’t see it. Not be­ing sold im­plies some­thing like … I would nor­mally say I’m not sold on x when I can see it but I don’t see the jus­tifi­ca­tion for it yet where here I don’t ac­tu­ally have a crisp pic­ture of what seven year suc­cess looks like.

Fa­cil­i­ta­tor: Ozzie which one of those feels more like “Argh, I just want to tell Vaniver what I’m think­ing now”?

Ozzie: So on ex­plo­ra­tion and ex­ploita­tion. One the one hand, not that much time or re­source is go­ing into this yet. Maybe a few full-time months like to think about it and then sev­eral for mak­ing we­bapps. Maybe that was too much. I think it wasn’t.

Ozzie: The amount of va­ri­ety of types of pro­pos­als that are on the table right now com­pared to when I started I’m pretty happy with for like a few months of think­ing. Espe­cially since for me to get in­volved in AI would have taken quite a bit more time of ed­u­ca­tion and stuff. It did seem like a few cheap wins at this point. I still kind of feel like that.

Im­por­tance and ne­glect­ed­ness of fore­cast­ing work

Ozzie: I also do get the sense that this area is still pretty ne­glected.

Vaniver: Yeah. I guess in my mind ne­glect­ing is both peo­ple aren’t work­ing on it and peo­ple should be work­ing on it. Is that true for you also?

Ozzie: There are three as­pects. Im­por­tance, tractable, and ne­glected. It could be ne­glected but not im­por­tant. I’m just say­ing here that it’s ne­glected.

Vaniver: Okay. You are just say­ing that peo­ple aren’t work­ing on it.

Ozzie: Yeah. You can talk about then the ques­tions of im­por­tance and tractabil­ity.

Fa­cil­i­ta­tor: I feel like there are a lot of things that one can do. One can Like try to start a group house in Cam­bridge, one can try and teach ra­tio­nal­ity at the FHI. Fore­cast­ing … some­thing about “ne­glected” doesn’t feel like it quite gets at the thing be­cause the space is suffi­ciently vast.

Ozzie: Yeah. The next part would be im­por­tance. I ob­vi­ously think that it’s higher in im­por­tance than a lot of the other things that seem similarly ne­glected. Let’s say ba­si­cally the ra­tio of im­por­tance in im­por­tance, ne­glected and tractable was pretty good for fore­cast­ing. I’m happy to spend a while get­ting into that.

Tractabil­ity of fore­cast­ing work

Vaniver: I guess I ac­tu­ally don’t care all that much about the im­por­tance be­cause I buy if we could … in my ear­lier fram­ing, we move ev­ery­one to a “num­bers-are-real” or­ga­ni­za­tion. That would be ex­cel­lent. The thing that I feel most doomy about is some­thing like the tractabil­ity where it feels like most of the wins that peo­ple were try­ing to get be­fore turned out to be ex­tremely difficult and not re­ally worth it. I’m in­ter­ested in see­ing the av­enues that you think are promis­ing in this re­gard.

Ozzie: Yeah. It’s an in­ter­est­ing ques­tion. I think a lot of peo­ple have the no­tion that we’ve had tons and tons of at­tempts at fore­cast­ing sys­tems since Robin Han­son started talk­ing about Pre­dic­tion mar­kets. All of those have failed there­fore Pre­dic­tion mar­kets have failed and it’s not worth spend­ing an­other per­son and it’s like a heap of dead bod­ies.

Ozzie: The view­point that I have where it definitely doesn’t look that way, for one thing, the tool­ing. If you ac­tu­ally look at a lot of the tool­ing that’s been done, a lot of it is still pretty ba­sic. One piece of ev­i­dence for that is the fact that al­most no EA or­ga­ni­za­tions are us­ing it them­selves.

Ozzie: That could also be that it’s re­ally hard to make good tool­ing. If you look at it, ba­si­cally if you look at non-pre­dic­tion mar­ket sys­tems, in terms of pre­dic­tion mar­kets there were also a few at­tempts. But the area is kind of ille­gal. Like I said, there are is­sues with pre­dic­tion mar­kets.

Ozzie: If you look at non pre­dic­tion mar­ket tour­na­ment ap­pli­ca­tions. Ba­si­cally you have a few. The GJP doesn’t make their own. They’ve used Cul­ti­vate Labs. Now they’re start­ing to try and make their own sys­tems as well. But the GJP peo­ple are mostly poli­ti­cal sci­en­tists and stuff, not de­vel­op­ers.

Ozzie: A lot of ex­per­i­ments they’ve done are poli­ti­cal. It’s not like en­g­ineer­ing ques­tions about how there’d be an awe­some en­g­ineer­ing in­fras­truc­ture. My take on that is if you put some re­ally smart en­g­ineer/​en­trepreneur in that type of area, I’d ex­pect them to gen­er­ally have a very differ­ent ap­proach.

Vaniver: There’s a say­ing from Nin­tendo: “if your game is not fun with pro­gram­mer art, it won’t be fun in the fi­nal product” or some­thing. Similarly, I can buy that there’s some min­i­mum level of tool­ing that we need for these sorts of fore­casts that would be sen­si­ble it all. But it feels to me that if I ex­pected fore­cast­ing to be easy in the rele­vant ways, the shitty early ver­sions would have suc­ceeded with­out us hav­ing to build later good ver­sions.

Ozzie: There’s a ques­tion of what “enough” is. They definitely have suc­ceeded to some ex­tent. Pre­dic­tionBook has been used by Gw­ern and a lot of other peo­ple. Some also use their own se­tups and Me­tac­u­lus and stuff… So. you can ac­tu­ally see a de­cent amount of ac­tivity. I don’t see many other ar­eas that have nearly that level of ex­per­i­men­ta­tion. There are very few other ar­eas that are be­ing used to the ex­tent that pre­dic­tions are used that we could imag­ine as fu­ture EA web apps.

Vaniver: The claim that I’m hear­ing there is some­thing like “I should be com­par­ing Pre­dic­tionBook and Me­tac­u­lus and similar things to re­ciproc­ity.io or some­thing, as this is just a web app made in their spare time and if it ac­tu­ally sees use that’s rele­vant”.

Ozzie: I think that there’s a lot of truth to that, though maybe not ex­actly be the case. Maybe we’re past a bit of re­ciproc­ity.

Vaniver: Bee­minder also feels like it’s in this camp to me to me al­though less like EA spe­cific.

Ozzie: Yeah. Or like Anki.

Ozzie: Right.

Tech­ni­cal tool­ing for Effec­tive Altruism

Ozzie: There’s one ques­tion which is A), do we think that there’s room for tech­ni­cal tool­ing around Effec­tive Altru­ism? B) if there is, what are the ar­eas that seems ex­cit­ing? I don’t see many other ex­cit­ing ar­eas. Of course, that is an­other ques­tion. If you think … that’s not ex­actly de­pend­ing fore­cast­ing… but more like, if you don’t like fore­cast­ing, what do you like? Be­cause there’s a con­clu­sion that we just don’t like EA tools and there’s al­most noth­ing in the space. Be­cause there’s not much more that seems ob­vi­ously more ex­cit­ing. But there’s a very differ­ent side to the ar­gu­ment.

Vaniver: Yeah. It’s in­ter­est­ing be­cause on the one hand I do buy the frame of it might make sense to just try to make EA tools and then to figure out what the most promis­ing EA tool is. Then also I can see the thing go­ing in the re­verse di­rec­tion which is some­thing like if none of the op­por­tu­ni­ties for EA tools are good then peo­ple shouldn’t try it. Also if we do in fact come up with 12 great op­por­tu­ni­ties for EA tools this should be a wave of EA grants or what­ever.

Vaniver: I would be ex­cited about some­thing dou­ble crux-shaped. But I worry this runs into the prob­lem that ar­gu­ment map­ping and mind map­ping have all run into be­fore. There’s some­thing that’s nice about do­ing a dou­ble crux which makes it grounded out in the trace that one par­tic­u­lar con­ver­sa­tion takes as op­posed to ac­tu­ally try­ing to rep­re­sent minds. I feel like most of the other EA tools would be … in my head it starts as silly one-offs. I’m think­ing of things like for the 2016 elec­tion there was a vote-swap­ping thing to try to get third party vot­ers in swing states to vote for what­ever party in ex­change for third party votes in safe states. I think Scott Aaron­s­son pro­moted it but I don’t think he made it. But. It feels to me like that sort of thing. We may end up see­ing lots of things like that where it’s like “if we had soft­ware en­g­ineers ready to go, we would make these pro­jects hap­pen”. Cur­rently I ex­pect it’s suffi­cient that peo­ple do that just for the glory of hav­ing done it. But the Bee­minder style things are more like, “oh yeah, ac­tu­ally this is the sort of thing where if it’s pro­vid­ing value then we should have peo­ple work­ing for it and the peo­ple will be paid by the value they’re pro­vid­ing”. Though that move is a bit weird be­cause that doesn’t quite cap­ture how LessWrong is be­ing paid for...

Ozzie: Yeah. Mul­ti­ple ques­tions on that. This could be a long wind­ing con­ver­sa­tion. One would be “should things like this be funded by the users or by other groups?”

Ozzie: One thing I’d say that … I joined 80000 Hours about four years ago. I worked with them to help them with their ap­pli­ca­tion and de­cided at that point that it should be much less of an ap­pli­ca­tion and more of like a blog. I helped them scale it down.

Ozzie: I was look­ing for other op­por­tu­ni­ties to make big EA apps. At that point there was not much money. I kind of took a de­tour and I’m com­ing back to it in some ways. In a way I’ve ex­pe­rienced this with Guessti­mate, which has been used a bit. Apps from Effec­tive Altru­ism has ad­van­tages and dis­ad­van­tages. One dis­ad­van­tage is that writ­ing soft­ware is an ex­pen­sive thing. An ad­van­tage is that it’s very tractable. By tractable I mean you could say “if I spent $200,000 and three en­g­ineer years I could ex­pect to get this thing out”. Right now we are in a situ­a­tion where we do have hy­po­thet­i­cally a de­cent amount of money if it could beat a spe­cific bar. The pro­gram­mers don’t even have to be these in­tense EAs (al­though it is definitely helpful).

5-MIN BREAK

Tractabil­ity of fore­cast­ing within vs out­side EA

Ozzie: I feel like we both kind of agree, that, hy­po­thet­i­cally, if a fore­cast­ing sys­tem was used and peo­ple de­cided it was quite use­ful, and we could get to the point that EA orgs were mak­ing de­ci­sions in big ways with it, that could be a nice thing to have. But there’s dis­agree­ment about whether that’s an ex­ist­ing pos­si­bil­ity, and whether ex­ist­ing ev­i­dence shows us that won’t hap­pen.

Vaniver: I’m now also more ex­cited about the prospects of this for the EA space. Where I imag­ine a soft­ware en­g­ineer com­ing out of col­lege say­ing “My startup idea is pre­dic­tion mar­kets”, and my re­sponse is “let’s do some mar­ket re­search!” But in the EA space the mar­ket re­search is quite differ­ent, be­cause peo­ple are more in­ter­ested in us­ing the thing, and there’s more money for crazy long-shots… or not crazy long-shots, but rather, “if we can make this hand­ful of peo­ple slightly more effec­tive, there are many dol­lars on the line”.

Ozzie: Yeah.

Vaniver: It’s similar to a case where you have this ob­scure tool for Wall Street traders, and even if you only sell to one firm you may just pay for your­self.

Ozzie: I’m skep­ti­cal when­ever I hear an en­trepreneur say­ing “I’m do­ing a pre­dic­tion mar­ket thing”. It’s usu­ally crypto re­lated. In­ter­est­ingly most pre­dic­tion plat­forms don’t pre­dict their own suc­cess, and that kind of tells you some­thing…

(Au­di­ence laugh­ter)

Vaniver: Well this is just like the pre­dic­tion mar­ket on “will the uni­verse still ex­ist”. It turns out it’s just asym­met­ric who gets paid out.

Medium-term goals and lean startup methodology

Fa­cil­i­ta­tor: Vaniver, your ear­lier im­pres­sion was you didn’t have a sense what medium term progress would look like?

Vaniver: It’s im­por­tant to flag that I changed my mind. When I think about fore­cast­ing as a ser­vice for the EA space, I’m now more op­ti­mistic, com­pared to when I think of it as a ser­vice on the gen­eral mar­ket. It’s not sur­pris­ing OpenPhil bought a bunch of Good Judge­ment fore­cast­ers. Whereas it would be a sur­prise if Exxon bought GJP ques­tions.

Vaniver: Ozzie do you have de­tailed vi­sions of what suc­cess looks like in sev­eral years?

Ozzie: I have mul­ti­ple op­tions. The way I see is that… when lots of YC star­tups come out they have a sense that “this is an area that seems kind of ex­cit­ing”. We kind of have ev­i­dence that it may be in­ter­est­ing, and also that it may not be in­ter­est­ing. We don’t know what suc­cess looks like for an or­gani­sa­tion in this space, though hope­fully we’re com­pe­tent and we could work quickly to figure it out. And it seems things are ex­cit­ing enough for it to be worth that effort.

Ozzie: So AirBnB and the vast ma­jor­ity of com­pa­nies didn’t have a su­per clear idea of how they were go­ing to be use­ful when they started. But they do have good in­puts, and a vague sense of what kind of cool out­puts would be.

Ozzie: There’s ev­i­dence that statis­ti­cally this seems to be what works in startup land.

Ozzie: Some of the ev­i­dence against. There was a ques­tion of “if you have a few small things that are work­ing but are not su­per ex­cit­ing, does that make it pretty un­likely you’ll see some­thing in this space?”

Ozzie: It would be hard to make a strong ar­gu­ment that YC wouldn’t find any com­pa­nies in such cases. They do fund things with­out any ev­i­dence of suc­cess.

Vaniver: But also if you’re look­ing for moon­shots, mild suc­cess the first few times is ev­i­dence against “the first time it just works and ev­ery­thing goes great”.

Limi­ta­tions of cur­rent fore­cast­ing tooling

Ozzie: Of course in that case you’re ques­tion is of ex­actly what is this that’s been tried. I think there are ar­gu­ments that there are more ex­cit­ing things on the hori­zon which haven’t been tried.

Ozzie: Now we have Pre­dic­tionBook, Me­tac­u­lus, and hy­po­thet­i­cally Cul­ti­vate Labs and an­other similar site. Cul­ti­vate Labs does en­ter­prise gigs, and are used by big com­pa­nies like Exxon for ideation and similar things. They’re a YC com­pany and have around 6 peo­ple. But they haven’t done amaz­ingly well. They’re pretty ex­pen­sive to use. At this point you’d have to spend around $400 for one in­stance per month. And even then you get a spe­cific en­ter­prise-y app that’s kind of messy.

Ozzie: Then if you ac­tu­ally look at the amount of work done on Pre­dic­tionBook and Me­tac­u­lus, it’s not that much. Pre­dic­tionBook might have had 1-2 years of en­g­ineer­ing effort, around 7 years ago. Peo­ple think it’s cool, but not a se­ri­ous site re­ally. As for Me­tac­u­lus, I have a lot of re­spect for their team. That pro­ject was prob­a­bly around 3-5 en­g­ineer­ing years.

Ozzie: They have a spe­cific set of as­sump­tions I kind of dis­agree with. For ex­am­ple, ev­ery­one has to post their ques­tions in one main thread, and sep­a­rate com­mu­ni­ties only ex­ist by hav­ing sub­do­mains. They’re mostly ex­cited about set­ting up those sub­do­mains for big pro­jects.

Ozzie: So if a few of us wanted to ex­per­i­ment with “oh, let’s make a small com­mu­nity, have some pri­vacy, and start mess­ing around with ques­tions” it’s hard to do that…

Vaniver: So what would this be for? Who wants their own in­stances? MMO guilds?

Ja­cob: Here’s one ex­am­ple of the sim­plest thing you cur­rently can­not do. (Or could not do around Jan­uary 1st 2019.) Four guys are hang­ing out, and they won­der “When will peo­ple next climb mount ever­est?” They then just want to note down their dis­tri­bu­tions for this and get some feed­back, with­out hav­ing to spec­ify ev­ery­thing in a Google doc or a spread­sheet which doesn’t have dis­tri­bu­tions.

Fa­cil­i­ta­tor: Which bit breaks?

Ja­cob: You can­not small pri­vate chan­nels for mul­ti­ple peo­ple which take 5 min­utes to set up where ev­ery­one records cus­tom dis­tri­bu­tions.

Vaniver: So I see what you can’t do. What I want is the group that wants to do it. For ex­am­ple, one of my house­mates loves these sites, but also is the sort of nerd that loves these kinds of sites in gen­eral. So should I just imag­ine there’s some MIT fra­ter­nity where ev­ery­one is re­ally into fore­cast­ing so they want a pri­vate do­main?

Ozzie: I’d say there’s a lot of un­cer­tainty. A bunch of groups may be in­ter­ested, and if a few are pretty good and hap­pen to be ex­cited, that would be nice. We don’t know who those are yet, but we have ideas. There are EA groups now. A lot of them are kind of already do­ing this; and we could en­able them to do it with­out hav­ing to pay $400-$1000 per month; or in a way that could make stuff pub­lic knowl­edge be­tween groups… For other smaller EA groups that just wanted to ex­per­i­ment the cur­rent tool­ing would cre­ate some awk­ward­ness.

Ozzie: If we want to run ex­per­i­ments on in­ter­est­ing things to fore­cast, e.g “how valuable is this thing?” or stuff around eval­u­a­tion or LessWrong posts. We’d have to set up a new in­stance for each. Or maybe we could have one in­stance and use it for all ex­per­i­ments, but that would force a sin­gle pri­vacy set­ting for all those ex­per­i­ments.

Ozzie: Be­sides that, at this point, I raised some money and spent like $11,000 to get some­one to pro­gram. So a lot of this tool­ing work is already done and these things are start­ing to be ex­per­i­mented with.

Knowl­edge graphs and mov­ing be­yond ques­tions-as-strings

Ozzie: In the medium-term there’s a lot of other in­ter­est­ing things. With the sys­tems right now, a lot of them as­sume all ques­tions are strings. So if you’re go­ing to have a 1000 ques­tions, it’s im­pos­si­ble to un­der­stand and for other peo­ple to get value from. So if you wanted to or­ganise some­thing like, “ev­ery EA org, how much money and per­son­nel would they have each year for the com­ing 10 years” it would be im­pos­si­ble with cur­rent meth­ods.

Vaniver: In­stead we’d want like a string pre­fix com­bined with a list of string post­fixes?

Ozzie: There are many ways to do it. I’m ex­per­i­ment­ing with us­ing a for­mal knowl­edge graph where you have for­mal en­tities.

Vaniver: So there would be a poin­ter to the MIRI ob­ject in­stead of a string?

Ozzie: Yeah, and that would in­clude in­for­ma­tion about how to find in­for­ma­tion about it from Wikipe­dia, etc. So if some­one wanted to set up an au­to­mated sys­tem to do some of this they could. Com­bin­ing this with bot sup­port would en­able ex­per­i­ments with data sci­en­tists and ML peo­ple to ba­si­cally aug­ment hu­man fore­casts with AI bots.

Vaniver: So, bot sup­port here is like par­ti­ci­pants in the mar­ket (I’ll just always call a “fore­cast-ag­gre­ga­tor” a mar­ket)? Some­how we have an API where they can just in­gest ques­tion and re­spond with dis­tri­bu­tions?

Ozzie: Even with­out bots, just or­ganis­ing struc­tured ques­tions in this way makes it eas­ier for both par­ti­ci­pants and ob­servers to get value.

Sum­mary of cruxes

Fa­cil­i­ta­tor: Yeah, I don’t know… You chat­ted for a while, I’m cu­ri­ous what feels like some of the things you’ll likely think a bit more about, or things that seem es­pe­cially sur­pris­ing?

Ozzie: I got the sense that we agreed on more things than I was kind of ex­pect­ing to. It seems lots of it now may be flesh­ing out what the mid-term would be, and see­ing if there’s parts of it you agree are sur­pris­ingly use­ful, or if it does seem like all of them are long-shots?

Vaniver: When I try to sum­marise your cruxes, what would change your mind about fore­cast­ing, it feels like 1) if you thought there was a differ­ent app/​EA tool to build, you would bet on that in­stead of this.

Ozzie: I’d agree with that.

Vaniver: And 2) if the track-record of at­tempts were more like… I don’t know what word to use, but maybe like “so­phis­ti­cated” or “effort­ful”? If there were more peo­ple who were more com­pe­tent than you and failed, then you’d de­cide to give up on it.

Ozzie: I agree.

Vaniver: I didn’t get the sense that there were con­cep­tual things about fore­cast­ing that you ex­pected to be sur­prised by. In my mind, get­ting data sci­en­tists to give use­ful fore­casts, even if the ques­tions are in some com­pli­cated knowl­edge graph or some­thing, seems mod­er­ately im­plau­si­ble. Maybe I could trans­fer that in­tu­ition, but maybe the re­sponse is “they’ll just at­tempt to do base-rate fore­cast­ing, and it’s just an NLP prob­lem to iden­tify the right baser­ates”

Vaniver: Does it feel like it’s miss­ing some of your cruxes?

Fa­cil­i­ta­tor: Ozzie, can you re­peat the ones he did say?

Au­di­ence: Good ques­tion.

Ozzie: I’m bad at this part. Now I’m a bit pan­icked be­cause I feel like I’m get­ting cor­nered or some­thing.

Vaniver: My sense was… 1) if there are bet­ter EA tools to build, you’d build them in­stead. 2) if bet­ter tries had failed, it would feel less tractable. And 3) Ab­sence of con­cep­tual un­cer­tain­ties that we could re­solve now. It feels it’s not like “Pre­vi­ous sys­tems are bad be­cause they got the ques­tions wrong” or “Ques­tion/​an­swer is not the right for­mat”. It’s closer to “Pre­vi­ous sys­tems are bad be­cause their ques­tion data struc­ture doesn’t give us the full flex­i­bil­ity that we want”.

Vaniver: Maybe that’s a bad char­ac­ter­i­za­tion of the au­toma­tion and knowl­edge graph stuff.

Ozzie: I’d definitely agree with the first two, al­though the first one is a bit more ex­pan­sive than tools. If there was e.g. a pro­gram­ming tool I’d be bet­ter for and had higher EV, I’d do that in­stead. Num­ber two, on tries, I agree if there were one or two other top pro­gram­ming teams who tried a few of these ideas and were very cre­ative about it, and failed, and es­pe­cially if they had soft­ware we could use now! (I’d feel much bet­ter about not hav­ing to make soft­ware) Then for three, The ab­sence of con­cep­tual un­cer­tain­ties. I don’t know ex­actly how to pin this down.

Fa­cil­i­ta­tor: I don’t know if we should fol­low this track.

Vaniver: I’m ex­cited about hear­ing what Ozzie’s con­cep­tual un­cer­tain­ties are.

Fa­cil­i­ta­tor: Yeah, I agree ac­tu­ally.

Ozzie’s con­cep­tual uncertainties

Ozzie: I think the way I’m look­ing at this prob­lem is one where there are many differ­ent types of ap­proaches that could be use­ful. There are many kinds of peo­ple who could be do­ing the pre­dict­ing. There are many kinds of pri­vacy. Maybe there would be more EAs us­ing it, or maybe we want non-EAs of spe­cific types. And within EA vs non-EA, there are many differ­ent kinds of things we might want to fore­cast. There are many cre­ative ways of or­ganis­ing ques­tion such that fore­cast­ing leads to an im­proved amount of ac­cu­racy. And I have a lot of un­cer­tainty about this en­tire space, and what ar­eas will be use­ful and what won’t.

Ozzie: I think I find it un­likely that ab­solutely noth­ing will be use­ful. But I do find it very pos­si­ble that it’ll just be too ex­pen­sive to find out use­ful things.

Vaniver: If it turned out noth­ing was use­ful, would it be the same rea­son for differ­ent ap­pli­ca­tions, or would it be “we just got tails on ev­ery differ­ent ap­pli­ca­tion?”

Ozzie: If it came out peo­ple just hate us­ing the tool­ing, then no mat­ter what ap­pli­ca­tion you use it for it will kind of suck.

Ozzie: For me a lot of this is a ques­tion of eco­nomics. Ba­si­cally, it re­quires some cost to both build the sys­tem and then get peo­ple to do fore­casts; and then to make the ques­tion and do the re­s­olu­tion. In some ar­eas the cost will be higher than value, and in some the value will be higher than the cost. It kind of comes down to a ques­tion of effi­ciency. Though, it’s hard to know, be­cause there’s always the ques­tion of maybe if I would have im­ple­mented this fea­ture things would have been differ­ent?

Vaniver: That made me think of some­thing spe­cific. When we look at the suc­cess sto­ries, they are things like weather and sports, whereas for sports you had to do some amount of difficult op­er­a­tional­i­sa­tion, but you sort of only had to do it once. The step I ex­pect to be hard across most ap­pli­ca­tion do­mains is the “I have a ques­tion, and now I need to turn it into a thing-that-can-be-quan­ti­ta­tively-fore­casted” and then I be­came kind of cu­ri­ous if we could get rel­a­tively sim­ple NLP sys­tems that could figure out the prob­a­bil­ity that a ques­tion is well-op­er­a­tional­ised or not. And have some sort of au­to­matic sug­ges­tions of like “ah, con­sider these cases” or what­ever, or “write the ques­tion this way rather than that way”.

Ozzie: From my an­gle, you could kind of call those “unique ques­tion”, where the marginal cost per ques­tion is pretty high. I think that if we were in any ecosys­tem where things were tremen­dously use­ful, the ma­jor­ity of ques­tions would not be like this.

Vaniver: Right so if I ask about the odds I will still be to­gether with my part­ner a while from now, I’d be clon­ing the stan­dard “will this re­la­tion­ship last?” ques­tion and sub­sti­tut­ing new poin­t­ers?

Ozzie: Yeah. And a lot of ques­tions would be like “GDP for ev­ery coun­try for ev­ery year” so there could be a large set of ques­tion tem­plates in the ecosys­tem. So you don’t need any fancy NLP; you could get pretty far with trend anal­y­sis and stuff.

Ozzie: On the ques­tion of whether data sci­en­tists would be likely to use it, that comes down to fund­ing and in­cen­tive struc­tures.

Ozzie: If you go on up­work and pay $10k to a data sci­en­tist they could give you a de­cent ex­trap­o­la­tion sys­tem, and you could then just build that into a bot and hy­po­thet­i­cally just keep pump­ing out these fore­casts as new data come in. Pipelines like that already ex­ist. What this would be do­ing is to provide in­fras­truc­ture to help sup­port them ba­si­cally.

END OF TRANSCRIPT

At this point the con­ver­sa­tion opened up to ques­tions from the au­di­ence.

While this con­ver­sa­tion was in­spired by the dou­ble-crux tech­nique, there is a large vari­a­tion in how such ses­sions might look. Even when both par­ti­ci­pants re­tain the spirit of seek­ing the truth and chang­ing their minds in that di­rec­tion, some dis­agree­ments dis­si­pate af­ter less than an hour, oth­ers take 10+ hours to re­solve and some re­main un­solved for years. It seems good to have more pub­lic ex­am­ples of gen­uine truth-seek­ing di­alogue, but at the same time should be noted that such con­ver­sa­tions might look very differ­ent from this one.