Blind Empiricism

Fol­low-up to: Liv­ing in an Inad­e­quate World


The the­sis that needs to be con­trasted with mod­esty is not the as­ser­tion that ev­ery­one can beat their civ­i­liza­tion all the time. It’s not that we should be the sort of per­son who sees the world as mad and pur­sues the strat­egy of be­liev­ing a hot stock tip and in­vest­ing ev­ery­thing.

It’s just that it’s okay to rea­son about the par­tic­u­lars of where civ­i­liza­tion might be in­ad­e­quate, okay to end up be­liev­ing that you can state a bet­ter mon­e­tary policy than the Bank of Ja­pan is im­ple­ment­ing, okay to check that against ob­ser­va­tion when­ever you get the chance, and okay to up­date on the re­sults in ei­ther di­rec­tion. It’s okay to act on a model of what you think the rest of the world is good at, and for this model to be sen­si­tive to the speci­fics of differ­ent cases.

Why might this not be okay?

It could be that “act­ing on a model” is sus­pect, at least when it comes to com­pli­cated macrophe­nom­ena. Con­sider Isa­iah Ber­lin’s dis­tinc­tion be­tween “hedge­hogs” (who rely more on the­o­ries, mod­els, global be­liefs) and “foxes” (who rely more on data, ob­ser­va­tions, lo­cal be­liefs). Many peo­ple I know see the fox’s mind­set as more ad­mirable than the hedge­hog’s, on the ba­sis that it has greater im­mu­nity to fan­tasy and dog­ma­tism. And Philip Tet­lock’s re­search has shown that poli­ti­cal ex­perts who rely heav­ily on sim­ple over­ar­ch­ing the­o­ries—the kind of peo­ple who use the word “more­over” more of­ten than “how­ever”—perform sub­stan­tially worse on av­er­age in fore­cast­ing tasks.1

Or per­haps the sus­pect part is when mod­els are “sen­si­tive to the speci­fics of differ­ent cases.” In a 2002 study, Buehler, Griffin, and Ross asked a group of ex­per­i­men­tal sub­jects to provide lots of de­tails about their Christ­mas shop­ping plans: where, when, and how. On av­er­age, this ex­per­i­men­tal group ex­pected to finish shop­ping more than a week be­fore Christ­mas. Another group was sim­ply asked when they ex­pected to finish their Christ­mas shop­ping, with an av­er­age re­sponse of 4 days. Both groups finished an av­er­age of 3 days be­fore Christ­mas. Similarly, stu­dents who ex­pected to finish their as­sign­ments 10 days be­fore dead­line ac­tu­ally finished one day be­fore dead­line; and when asked when they had pre­vi­ously com­pleted similar tasks, replied, “one day be­fore dead­line.” This sug­gests that tak­ing the out­side view is an effec­tive re­sponse to the plan­ning fal­lacy: rather than try­ing to pre­dict how many hic­cups and de­lays your plans will run into by re­flect­ing in de­tail on each plan’s par­tic­u­lars (the “in­side view”), you can do bet­ter by just guess­ing that your fu­ture plans will work out roughly as well as your past plans.

As stated, these can be perfectly good de­bi­as­ing mea­sures. I worry, how­ever, that many peo­ple end up mi­sus­ing and over­ap­ply­ing the “out­side view” con­cept very soon af­ter they learn about it, and that a lot of peo­ple tie too much of their men­tal con­cep­tion of what good rea­son­ing looks like to the stereo­type of the hum­ble em­piri­cist fox. I re­cently no­ticed this as a com­mon thread run­ning through three con­ver­sa­tions I had.

I am not able to re­count these con­ver­sa­tions in a way that does jus­tice to the peo­ple I spoke to, so please treat my re­count­ing as an un­fair and bi­ased illus­tra­tion of rele­vant ideas, rather than as a neu­tral recita­tion of the facts. My goal is to illus­trate the kinds of rea­son­ing pat­terns I think are caus­ing epistemic harm: to point to some ca­naries in the coal mine, and to be clear that when I talk about mod­esty I’m not just talk­ing about Hal Fin­ney’s ma­jori­tar­i­anism or the ex­plicit be­lief in civ­i­liza­tional ad­e­quacy.

i.

Con­ver­sa­tion 1 was about the im­por­tance of writ­ing code to test AI ideas. I sug­gested that when peo­ple tried writ­ing code to test an idea I con­sid­ered im­por­tant, I wanted to see the code in ad­vance of the ex­per­i­ment, or with­out be­ing told the re­sult, to see if I could pre­dict the out­come cor­rectly.

I got push­back against this, which sur­prised me; so I replied that my hav­ing a chance to make ad­vance ex­per­i­men­tal pre­dic­tions was im­por­tant, for two rea­sons.

First, I thought it was im­por­tant to de­velop a skill and method­ol­ogy of pre­dict­ing “these sorts of things” in ad­vance, be­cause past a cer­tain level of de­vel­op­ment when work­ing with smarter-than-hu­man AI, if you can’t see the bul­lets com­ing in ad­vance of the ex­per­i­ment, the ex­per­i­ment kills you. This be­ing the case, I needed to test this skill as much as pos­si­ble, which meant try­ing to make ex­per­i­men­tal pre­dic­tions in ad­vance so I could put my­self on trial.

Se­cond, if I could pre­dict the re­sults cor­rectly, it meant that the ex­per­i­ments weren’t say­ing any­thing I hadn’t figured out through past ex­pe­rience and the­o­riz­ing. I was wor­ried that some­body might take a re­sult I con­sid­ered an ob­vi­ous pre­dic­tion un­der my cur­rent views and say that it was ev­i­dence against my the­ory or method­ol­ogy, since both of­ten get mi­s­un­der­stood.2 If you want to use ex­per­i­ment to show that a cer­tain the­ory or method­ol­ogy fails, you need to give ad­vo­cates of the the­ory/​method­ol­ogy a chance to say be­fore­hand what they think they pre­dict, so the pre­dic­tion is on the record and nei­ther side can move the goal­posts.

And I still got push­back, from a MIRI sup­porter with a strong tech­ni­cal back­ground; so I con­versed fur­ther.

I now sus­pect that—at least this is what I think was go­ing on—their men­tal con­trast be­tween em­piri­cism and the­o­ret­i­cism was so strong that they thought it was un­safe to have a the­ory at all. That hav­ing a the­ory made you a bad hedge­hog with one big idea in­stead of a good fox who has lots of lit­tle ob­ser­va­tions. That the di­chotomy was be­tween mak­ing an ad­vance pre­dic­tion in­stead of do­ing the ex­per­i­ment, ver­sus do­ing the ex­per­i­ment with­out any ad­vance pre­dic­tion. Like, I sus­pect that ev­ery time I talked about “mak­ing a pre­dic­tion” they heard “mak­ing a pre­dic­tion in­stead of do­ing an ex­per­i­ment” or “cling­ing to what you pre­dict will hap­pen and ig­nor­ing the ex­per­i­ment.”

I can see how this kind of out­look would de­velop. The policy of mak­ing pre­dic­tions to test your un­der­stand­ing, to put it on trial, pre­sup­poses that you can ex­e­cute the “quickly say oops and aban­don your old be­lief” tech­nique, so that you can em­ploy it if the pre­dic­tion turns out to be wrong. To the ex­tent that “quickly say oops and aban­don your old be­lief” is some­thing the vast ma­jor­ity of peo­ple fail at, maybe on an in­di­vi­d­ual level it’s bet­ter for peo­ple to try to be pure foxes and only col­lect ob­ser­va­tions and try not to have any big the­o­ries. Maybe the av­er­age cog­ni­tive use case is that if you have a big the­ory and ob­ser­va­tion con­tra­dicts it, you will find some way to keep the big the­ory and thereby doom your­self. (The “Mis­takes Were Made, But Not By Me” effect.)

But from my per­spec­tive, there’s no choice. You just have to mas­ter “say oops” so that you can have the­o­ries and make ex­per­i­men­tal pre­dic­tions. Even on a strictly em­piri­cist level, if you aren’t al­lowed to have mod­els and you don’t make your pre­dic­tions in ad­vance, you learn less. An em­piri­cist of that sort can only learn sur­face gen­er­al­iza­tions about whether this phe­nomenon su­perfi­cially “looks like” that phe­nomenon, rather than build­ing causal mod­els and putting them on trial.

ii.

Con­ver­sa­tion 2 was about a web ap­pli­ca­tion un­der de­vel­op­ment, and it went some­thing like this.


startup founder 1: I want to get (prim­i­tive ver­sion of product) in front of users as fast as pos­si­ble, to see whether they want to use it or not.

eliezer: I pre­dict users will not want to use this ver­sion.

founder 1: Well, from the things I’ve read about star­tups, it’s im­por­tant to test as early as pos­si­ble whether users like your product, and not to ov­ereng­ineer things.

eliezer: The con­cept of a “min­i­mum vi­able product” isn’t the min­i­mum product that com­piles. It’s the least product that is the best tool in the world for some par­tic­u­lar task or work­flow. If you don’t have an MVP in that sense, of course the users won’t switch. So you don’t have a testable hy­poth­e­sis. So you’re not re­ally learn­ing any­thing when the users don’t want to use your product.3

founder 1: No bat­tle plan sur­vives con­tact with re­al­ity. The im­por­tant thing is just to get the product in front of users as quickly as pos­si­ble, so you can see what they think. That’s why I’m dis­heart­ened that (group of users) did not want to use (early ver­sion of product).

eliezer: This re­minds me of a con­ver­sa­tion I had about AI twice in the last month. Two sep­a­rate peo­ple were claiming that we would only learn things em­piri­cally by ex­per­i­ment­ing, and I said that in cases like that, I wanted to see the ex­per­i­ment de­scrip­tion in ad­vance so I could make ad­vance pre­dic­tions and put on trial my abil­ity to fore­see things with­out be­ing hit over the head by them.

In both of those con­ver­sa­tions I had a very hard time con­vey­ing the idea, “Just be­cause I have a the­ory does not mean I have to be in­sen­si­tive to ev­i­dence; the ev­i­dence tests the the­ory, po­ten­tially falsifies the the­ory, but for that to work you need to make ex­per­i­men­tal pre­dic­tions in ad­vance.” I think I could have told you in ad­vance that (group of users) would not want to use (early ver­sion of product), be­cause (group of users) is try­ing to ac­com­plish (task 1) and this ver­sion of the product is not the best available tool they’ll have seen for do­ing (task 1).


I can’t con­vey it very well with all the de­tails redacted, but the im­pres­sion I got was that the mes­sage of “dis­trust the­o­riz­ing” had be­come so strong that Founder 1 had stopped try­ing to model users in de­tail and thought it was fu­tile to make an ad­vance pre­dic­tion. But if you can’t model users in de­tail, you can’t think in terms of work­flows and tasks that users are try­ing to ac­com­plish, or at what point you be­come visi­bly the best tool the user has ever en­coun­tered to ac­com­plish some par­tic­u­lar work­flow (the min­i­mum vi­able product). The al­ter­na­tive, from what I could see, was to think in terms of “fea­tures” and that as soon as pos­si­ble you would show the product to the user and see if they wanted that sub­set of fea­tures.

There’s a ver­sion of this hy­poth­e­sis which does make sense, which is that when you have the min­i­mum com­pilable product that it is phys­i­cally pos­si­ble for a user to in­ter­act with, you can ask one of your friends to sit down in front of it, you can make a pre­dic­tion about what parts they will dis­like or find difficult, and then you can see if your pre­dic­tion is cor­rect. Maybe your product ac­tu­ally fails much ear­lier than you ex­pect.

But this is not like get­ting early users to vol­un­tar­ily adopt your product. This is about ob­serv­ing, as early as pos­si­ble, how vol­un­teers re­act to un­vi­able ver­sions of your product, so you know what needs fix­ing ear­liest or whether the ex­posed parts of your the­ory are hold­ing up so far.

It re­ally looks to me like the mod­est re­ac­tions to cer­tain types of over­con­fi­dence or er­ror are taken by many be­liev­ers in mod­esty to mean, in prac­tice, that the­o­ries just get you into trou­ble; that you can ei­ther make pre­dic­tions or look at re­al­ity, but not both.

iii.

Con­ver­sa­tion 3 was with Startup Founder 2, a mem­ber of the effec­tive al­tru­ism com­mu­nity who was mak­ing Ma­te­rial Ob­jects—I’ll call them “Snow­shoes”—who had re­marked that mod­ern ven­ture cap­i­tal was only in­ter­ested in 1000x re­turns and not 20x re­turns.

I asked why he wasn’t try­ing for 1000x re­turns with his cur­rent com­pany sel­l­ing Snow­shoes—was that more an­noy­ance/​work than he wanted to un­der­take?

He replied that most com­pa­nies in a re­lated in­dus­try, Flip­pers, weren’t that large, and it seemed to him that based on the out­side view, he shouldn’t ex­pect his com­pany to be­come larger than the av­er­age com­pany in the Flip­pers in­dus­try. He asked if I was tel­ling him to try be­ing more con­fi­dent.

I re­sponded that, no, the thing I wanted him to think was or­thog­o­nal to mod­esty ver­sus con­fi­dence. I ob­served that the cus­tomer use case for Flip­pers was ac­tu­ally quite differ­ent from Snow­shoes, and asked him if he’d con­sid­ered how many uses of Pre­vi­ous Snow­shoes in the world would, in fact, benefit from be­ing re­placed by the more de­vel­oped ver­sion of Snow­shoes he was mak­ing.

He said that this seemed to him too much like op­ti­mism or fan­tasy, com­pared to ask­ing what his com­pany had to do next.

I had asked about how cus­tomers would benefit from new and im­proved Snow­shoes be­cause my back­ground model says that star­tups are more likely to suc­ceed if they provide real eco­nomic value—value of the kind that Danslist would provide over Craigslist if Danslist suc­ceeded, and of the kind that Craigslist pro­vides over news­pa­per clas­sifieds. Get­ting peo­ple to ac­tu­ally buy your product, of course, is a sep­a­rate ques­tion from whether it would provide real value of that kind. And there’s an ob­vi­ous failure mode where you’re in love with your product and you over­es­ti­mate the product’s value or un­der­es­ti­mate the costs to the user. There’s an ob­vi­ous failure mode where you just look at the real eco­nomic value and get all cheer­ful about that, with­out ask­ing the fur­ther nec­es­sary ques­tion of how many de­ci­sion­mak­ers will choose to use your product; or whether your mar­ket­ing mes­sage is ei­ther opaque or eas­ily faked; or whether any com­peti­tors will get there first if they see you be­ing suc­cess­ful early on; or whether you could defend a price pre­mium in the face of com­pe­ti­tion. But the ques­tion of real eco­nomic value seems to me to be one of the fac­tors go­ing into a startup’s odds of suc­ceed­ing—Craigslist’s suc­cess is in part ex­plained by the ac­tual benefit buy­ers and sel­l­ers de­rive from the ex­is­tence of Craigslist—and worth fac­tor­ing out be­fore dis­cussing pur­chaser de­ci­sion­mak­ing and value-cap­tur­ing ques­tions.4

It wasn’t that I was try­ing to get Founder 2 to be more op­ti­mistic (though I did think, given his Snow­shoes product, that he ought to at least try to be more am­bi­tious). It was that it looked to me like the out­side view was shut­ting down his causal model of how and why peo­ple might use his product, and sub­sti­tut­ing, “Just try to build your Snow­shoes and see what hap­pens, and at best don’t ex­pect to suc­ceed more than the av­er­age com­pany in a re­lated in­dus­try.” But I don’t think you can get so far as even the av­er­age sur­viv­ing com­pany, un­less you have a causal model (the dreaded in­side view) of where your com­pany is sup­posed to go and what re­sources are re­quired to get there.

I was ask­ing, “What level do you want to grow to? What needs to be done for your com­pany to grow that much? What’s the ob­sta­cle to tak­ing the next step?” And… I think it felt im­mod­est to him to claim that his com­pany could grow to a given level; so he thought only in terms of things he knew he could try, for­ward-chain­ing from where he was rather than back­ward-chain­ing from where he wanted to go, be­cause that way he didn’t need to im­mod­estly think about suc­ceed­ing at a par­tic­u­lar level, or en­dorse an in­side view of a par­tic­u­lar path­way.

I think the de­tails of his busi­ness plan had the same out­side-view prob­lem. In the Flip­pers in­dus­try, two com­mon ver­sions of Flip­pers that were sold were Deluxe Flip­pers and Ba­sic Flip­pers. Deluxe Flip­pers were ba­si­cally pre­assem­bled Ba­sic Flip­pers, and Deluxe Flip­pers sold for a much higher pre­mium than Ba­sic Flip­pers even though it was easy to as­sem­ble them.

We were talk­ing about a po­ten­tial vari­a­tion of his Snow­shoes, and he said that it would be too ex­pen­sive to ship a Deluxe ver­sion, but not worth it to ship a Ba­sic ver­sion, given the av­er­age pre­miums the out­side view said these prod­ucts could com­mand.

I asked him why, in the Flip­pers in­dus­try, Deluxe sold for such a pre­mium over Ba­sic when it was so easy to as­sem­ble Ba­sic into Deluxe. Why was this price pre­mium be­ing main­tained?

He sug­gested that maybe peo­ple re­ally val­ued the last lit­tle bit of con­ve­nience from buy­ing Deluxe in­stead of Ba­sic.

I sug­gested that in this large in­dus­try of slightly differ­en­ti­ated Flip­pers, maybe a lot of price-sen­si­tive con­sumers bought only Ba­sic ver­sions, mean­ing that the few Deluxe buy­ers were price-in­sen­si­tive. I then ob­served again that the best use case for his product was quite differ­ent from the stan­dard use case in the Flip­per in­dus­try, and that he didn’t have much di­rect com­pe­ti­tion. I sug­gested that, for his cus­tomers that weren’t oth­er­wise cus­tomers in the Flip­pers in­dus­try, it wouldn’t make much of a differ­ence to his pric­ing power whether he sold Deluxe or the much eas­ier to ship Ba­sic ver­sion.

And I re­marked that it seemed to me un­wise in gen­eral to look at a mys­te­ri­ous pric­ing pre­mium, and as­sume that you could get that pre­mium. You couldn’t just look at av­er­age Deluxe prices and as­sume you could get them. Gen­er­ally speak­ing, this in­di­cates some sort of rent or mar­ket bar­rier; and where there is a stream of rent, there will be walls built to ex­clude other peo­ple from drink­ing from the stream. Maybe the high Deluxe prices meant that Deluxe con­sumers were hard to mar­ket to, or very un­likely to switch providers. You couldn’t just take the out­side view of what Deluxe prod­ucts tended to sell like.

He replied that he didn’t think it was wise to say that you had to fully un­der­stand ev­ery part of the mar­ket be­fore you could do any­thing; es­pe­cially be­cause, if you had to un­der­stand why Deluxe prod­ucts sold at a pre­mium, it would be so easy to just make up an ex­pla­na­tion.

Again I un­der­stand where he was com­ing from, in terms of the av­er­age cog­ni­tive use case. When I try to ex­plain a phe­nomenon, I’m also im­plic­itly rely­ing on my abil­ity to use a tech­nique like “don’t even start to ra­tio­nal­ize,” which is a skill that I started prac­tic­ing at age 15 and that took me a decade to hone to a re­li­able and pro­duc­tive form. I also used the “no­tice when you’re con­fused about some­thing” tech­nique to ask the ques­tion, and a num­ber of other men­tal habits and tech­niques for ex­plain­ing mys­te­ri­ous phe­nom­ena—for starters, “de­tect­ing good­ness of fit” (see whether the ex­pla­na­tion feels “forced”) and “try fur­ther cri­tiquing the an­swer.” Maybe there’s no point in try­ing to ex­plain why Deluxe prod­ucts sell at a pre­mium to Ba­sic prod­ucts, if you don’t already have a lot of cog­ni­tive tech­nique for not com­ing up with ter­rible ex­pla­na­tions for mys­ter­ies, along with enough eco­nomics back­ground to know which things are im­por­tant mys­ter­ies in the first place, which ex­pla­na­tions are plau­si­ble, and so on.

But at the same time, it seems to me that there is a learn­able skill here, one that en­trepreneurs and ven­ture cap­i­tal­ists at least have to learn if they want to suc­ceed on pur­pose in­stead of by luck.

One needs to be able to iden­tify mys­te­ri­ous pric­ing and sales phe­nom­ena, read enough eco­nomics to speak the right sim­plic­ity lan­guage for one’s hy­pothe­ses, and then not come up with ter­rible ra­tio­nal­iza­tions. One needs to learn the key an­swers for how the challenged in­dus­try works, which means that one needs to have ex­plicit hy­pothe­ses that one can test as early as pos­si­ble.

Other­wise you’re… not quite doomed per se, but from the per­spec­tive of some­body like me, there will be ten of you with bad ideas for ev­ery one of you that hap­pens to have a good idea. And the peo­ple that do have good ideas will not re­ally un­der­stand what hu­man prob­lems they are ad­dress­ing, what their po­ten­tial users’ rele­vant mo­ti­va­tions are, or what are their crit­i­cal ob­sta­cles to suc­cess.

Given that anal­y­sis of ideas takes place on the level it does, I can un­der­stand why peo­ple would say that it’s fu­tile to try to an­a­lyze ideas, or that teams rather than ideas are im­por­tant. I’m not say­ing that ei­ther en­trepreneurs or ven­ture cap­i­tal­ists could, by an effort of will, sud­denly be­come great at an­a­lyz­ing ideas. But it seems to me that the out­side view con­cept, along with the Fox=Good/​Hedge­hog=Bad, Ob­ser­va­tion=Good/​The­ory=Bad mes­sages—in­clud­ing the re­lated mi­s­un­der­stand­ing of MVP as “just build some­thing and show it to users”—are pre­vent­ing peo­ple from even start­ing to de­velop those skills. At least, my ob­ser­va­tion is that some peo­ple go too far in their skep­ti­cism of model-build­ing.5

Maybe there’s a valley of bad ra­tio­nal­ity here and the in­junc­tion to not try to have the­o­ries or causal mod­els or pre­con­ceived pre­dic­tions is pro­tec­tive against en­ter­ing it. But first, if it came down to only those al­ter­na­tives, I’d frankly rather see twenty as­piring ra­tio­nal­ists fail painfully un­til one of them de­vel­ops the re­quired skills, rather than have no­body with those skills. And sec­ond, god damn it, there has to be a bet­ter way.

iv.

In situ­a­tions that are drawn from a bar­rel of causally similar situ­a­tions, where hu­man op­ti­mism runs ram­pant and un­fore­seen trou­bles are com­mon, the out­side view beats the in­side view. But in novel situ­a­tions where causal mechanisms differ, the out­side view fails—there may not be rele­vantly similar cases, or it may be am­bigu­ous which similar-look­ing cases are the right ones to look at.

Where two sides dis­agree, this can lead to refer­ence class ten­nis—both par­ties get stuck in­sist­ing that their own “out­side view” is the cor­rect one, based on di­verg­ing in­tu­itions about what similar­i­ties are rele­vant. If it isn’t clear what the set of “similar his­tor­i­cal cases” is, or what con­clu­sions we should draw from those cases, then we’re forced to use an in­side view—think­ing about the causal pro­cess to dis­t­in­guish rele­vant similar­i­ties from ir­rele­vant ones.

You shouldn’t avoid out­side-view-style rea­son­ing in cases where it looks likely to work, like when plan­ning your Christ­mas shop­ping. But in many con­texts, the out­side view sim­ply can’t com­pete with a good the­ory.

In­tel­lec­tual progress on the whole has usu­ally been the pro­cess of mov­ing from sur­face-level re­sem­blances to more tech­ni­cal un­der­stand­ings of par­tic­u­lars. Ex­treme ex­am­ples of this are com­mon in sci­ence and en­g­ineer­ing: the deep causal mod­els of the world that al­lowed hu­mans to plot the tra­jec­tory of the first moon rocket be­fore launch, for ex­am­ple, or that al­low us to ver­ify that a com­puter chip will work be­fore it’s ever man­u­fac­tured.

Where items in a refer­ence class differ causally in more ways than two Christ­mas shop­ping trips you’ve planned or two uni­ver­sity es­says you’ve writ­ten, or where there’s temp­ta­tion to cherry-pick the refer­ence class of things you con­sider “similar” to the phe­nomenon in ques­tion, or where the par­tic­u­lar bi­ases un­der­ly­ing the plan­ning fal­lacy just aren’t a fac­tor, you’re of­ten bet­ter off do­ing the hard cog­ni­tive la­bor of build­ing, test­ing, and act­ing on mod­els of how phe­nom­ena ac­tu­ally work, even if those mod­els are very rough and very un­cer­tain, or ad­mit of many ex­cep­tions and nu­ances. And, of course, dur­ing and af­ter the con­struc­tion of the model, you have to look at the data. You still need fox-style at­ten­tion to de­tail—and you cer­tainly need em­piri­cism.

The idea isn’t, “Be a hedge­hog, not a fox.” The idea is rather: de­vel­op­ing ac­cu­rate be­liefs re­quires both ob­ser­va­tion of the data and the de­vel­op­ment of mod­els and the­o­ries that can be tested by the data. In most cases, there’s no real al­ter­na­tive to stick­ing your neck out, even know­ing that re­al­ity might sur­prise you and chop off your head.


Next: Against Modest Episte­mol­ogy.

The full book will be available Novem­ber 16th. You can go to equil­ibri­abook.com to pre-or­der the book, or sign up for no­tifi­ca­tions about new chap­ters and other de­vel­op­ments.


  1. See Philip Tet­lock, “Why Foxes Are Bet­ter Fore­cast­ers Than Hedge­hogs.”

  2. As an ex­am­ple, my con­cep­tion of the re­ward hack­ing prob­lem for re­in­force­ment learn­ing sys­tems is that be­low cer­tain ca­pa­bil­ity thresh­olds, mak­ing the sys­tem smarter will of­ten pro­duce in­creas­ingly helpful be­hav­ior, as­sum­ing the re­wards are a mod­er­ately good proxy for the ac­tual ob­jec­tives we want the sys­tem to achieve. The prob­lem of the sys­tem ex­ploit­ing loop­holes and find­ing ways to max­i­mize re­wards in un­de­sir­able ways is mainly in­tro­duced when the sys­tem’s re­source­ful­ness is great enough, and its policy search space large enough, that op­er­a­tors can’t fore­see even in broad strokes what the re­ward-max­i­miz­ing strate­gies are likely to look like. If this idea gets rounded off to just “mak­ing an RL sys­tem smarter will always re­duce its al­ign­ment with the op­er­a­tor’s goal,” how­ever, then a re­searcher will mis­con­strue what counts as ev­i­dence for or against pri­ori­tiz­ing re­ward hack­ing re­search.

    And there are many other cases where ideas in AI al­ign­ment tend to be mi­s­un­der­stood, largely be­cause “AI” calls to mind pre­sent-day ap­pli­ca­tions. It’s cer­tainly pos­si­ble to run use­ful ex­per­i­ments with pre­sent-day soft­ware to learn things about fu­ture AGI sys­tems, but “see, this hill-climb­ing al­gorithm doesn’t ex­hibit the be­hav­ior you pre­dicted for highly ca­pa­ble Bayesian rea­son­ers” will usu­ally re­flect a mis­con­cep­tion about what the con­cept of Bayesian rea­son­ing is do­ing in AGI al­ign­ment the­ory.

  3. I did not say this then, but I should have: Ov­ereng­ineer­ing is when you try to make ev­ery­thing look pretty, or add ad­di­tional cool fea­tures that you think the users will like… not when you try to put in the key core fea­tures that are nec­es­sary for your product to be the best tool the user has ever seen for at least one work­flow.

  4. And a startup founder definitely needs to ask that ques­tion and an­swer it be­fore they go out and try to raise ven­ture cap­i­tal from in­vestors who are look­ing for 1000x re­turns. Don’t dis­count your com­pany’s case be­fore it starts. They’ll do that for you.

  5. As Tet­lock puts it in a dis­cus­sion of the limi­ta­tions of the fox/​hedge­hog model in the book Su­perfore­cast­ing: “Models are sup­posed to sim­plify things, which is why even the best are flawed. But they’re nec­es­sary. Our minds are full of mod­els. We couldn’t func­tion with­out them. And we of­ten func­tion pretty well be­cause some of our mod­els are de­cent ap­prox­i­ma­tions of re­al­ity.”