# nostalgebraist

Karma: 281 (LW), 19 (AF)
NewTop
Page 1
• But it seems like the core strat­egy—be both do­ing ob­ject-level cog­ni­tion and meta-level cog­ni­tion about how you’re do­ing ob­ject-level cog­ni­tive—is ba­si­cally the same.
It re­mains un­clear to me whether the right way to find these meta-strate­gies is some­thing like “start at the im­prac­ti­cal ideal and res­cue what you can” or “start with some­thing that works and build new fea­tures”; it seems like mod­ern com­pu­ta­tional Bayesian meth­ods look more like the former than the lat­ter.

I’d ar­gue that there’s usu­ally a causal ar­row from prac­ti­cal lore to im­prac­ti­cal ideals first, even if the ideals also in­fluence prac­tice at a later stage. Oc­cam’s Ra­zor came be­fore Solomonoff; “change your mind when you see sur­pris­ing new ev­i­dence” came be­fore for­mal Bayes. The “core strat­egy” you re­fer to sounds like “do both ex­plo­ra­tion and ex­ploita­tion,” which is the sort of idea I’d imag­ine goes back mil­len­nia (albeit not in those ex­act terms).

One of my goals in writ­ing this post was to for­mal­ize the feel­ing I get, when I think about an ideal­ized the­ory of this kind, that it’s a “re­dun­dant step” added on top of some­thing that already does all the work by it­self—like tak­ing a de­ci­sion the­ory and ap­pend­ing the rule “take the ac­tions this the­ory says to take.” But rather than be­ing trans­par­ently vac­u­ous, like that ex­am­ple, they are vac­u­ous in a more hid­den way, and the re­dun­dant steps they add tend to re­sem­ble le­gi­t­i­mately good ideas fa­mil­iar from prac­ti­cal ex­pe­rience.

Con­sider the fol­low­ing (ridicu­lous) the­ory of ra­tio­nal­ity: “do the most ra­tio­nal thing, and also, re­mem­ber to stay hy­drated :)”. In a cer­tain inane sense, most ra­tio­nal be­hav­ior “con­forms to” this the­ory, since the the­ory par­a­sitizes on what­ever ex­ist­ing no­tion of ra­tio­nal­ity you had, and stay­ing hy­drated is gen­er­ally a good idea and thus does not tend to con­flict with ra­tio­nal­ity. And when­ever stay­ing hy­drated is a good idea, one could imag­ine point­ing to this the­ory and say­ing “see, there’s the hy­dra­tion the­ory of ra­tio­nal­ity at work again.” But, of course, none of this should ac­tu­ally count in the “hy­dra­tion the­ory’s” fa­vor: all the real work is hid­den in the first step (“do the most ra­tio­nal thing”), and in­so­far as hy­dra­tion is ra­tio­nal, there’s no need to spec­ify it ex­plic­itly. This doesn’t quite map onto the schema, but cap­tures the way in which I think these the­o­ries tend to con­fuse peo­ple.

If the more se­ri­ous ideals we’re talk­ing about are like the “hy­dra­tion the­ory,” we’d ex­pect them to have the ap­pear­ance of ex­plain­ing ex­ist­ing prac­ti­cal meth­ods, and of ret­ro­spec­tively ex­plain­ing the suc­cess of new meth­ods, while not be­ing very use­ful for gen­er­at­ing any new meth­ods. And this seems gen­er­ally true to me: there’s a lot of en­sem­ble-like or reg­u­lariza­tion-like stuff in ML that can be in­ter­preted as Bayesian av­er­ag­ing/​up­dat­ing over some base space of mod­els, but most of the ex­cite­ment in ML is in these base spaces. We didn’t get neu­ral net­works from Bayesian first prin­ci­ples.

• Does “sub­sys­tem al­ign­ment” cover ev­ery in­stance of a Good­hart prob­lem in agent de­sign, or just a spe­cial class of prob­lems that arises when the sub-sys­tems are suffi­ciently in­tel­li­gent?

As stated, that’s a purely se­man­tic ques­tion, but I’m con­cerned with a more-than-se­man­tic is­sue here. When we’re talk­ing about all Good­hart prob­lems in agent de­sign, we’re talk­ing about a class of prob­lems that already comes up in all sorts of prac­ti­cal en­g­ineer­ing, and which can be satis­fac­to­rily han­dled in many real cases with­out need­ing any philo­soph­i­cal ad­vances. When I make ML mod­els at work, I worry about overfit­ting and about mis­al­ign­ments be­tween the loss func­tion and my true goals, but it’s usu­ally easy to place bounds on how much trou­ble these things can cause. Un­like hu­mans in­ter­act­ing with “evolu­tion,” my mod­els don’t live in a messy phys­i­cal world with porous bound­aries; they can only con­trol their out­put chan­nel, and it’s easy to place safety re­stric­tions on the out­put of that chan­nel, out­side the model. This is like “box­ing the AI,” but my “AI” is so dumb that this is clearly safe. (We could get even clearer ex­am­ples by look­ing at non-ML en­g­ineers build­ing com­po­nents that no one would call AI.)

Now, once the sub­sys­tem is “in­tel­li­gent enough,” maybe we have some­thing like a boxed AGI, with the usual boxed AGI wor­ries. But it doesn’t seem ob­vi­ous to me that “the usual boxed AGI wor­ries” have to carry over to this case. Mak­ing a sub­sys­tem strikes me as a more fa­vor­able case for “tool AI” ar­gu­ments than mak­ing some­thing with a di­rect in­ter­face to phys­i­cal re­al­ity, since you have more con­trol over what the out­put chan­nel does and does not in­fluence, and the task may be achiev­able even with a very limited in­put chan­nel. (As an ex­am­ple, one of the ML mod­els I work on has an out­put chan­nel that just looks like “show a sub­set of these things to the user”; if you re­placed it with a literal su­per­hu­man AGI, but kept the out­put chan­nel the same, not much could go wrong. This isn’t the kind of out­put chan­nel we’d ex­pect to hook up to a real AGI, but that’s my point: some­times what you want out of your sub­sys­tem just isn’t rich enough to make box­ing fail, and maybe that’s enough.)

• I was not aware of these re­sults—thanks. I’d glanced at the pa­pers on re­flec­tive or­a­cles but men­tally filed them as just about game the­ory, when of course they are re­ally very rele­vant to the sort of thing I am con­cerned with here.

We have a re­main­ing se­man­tic dis­agree­ment. I think you’re us­ing “em­bed­ded­ness” quite differ­ently than it’s used in the “Embed­ded World-Models” post. For ex­am­ple, in that post (text ver­sion):

In a tra­di­tional Bayesian frame­work, “learn­ing” means Bayesian up­dat­ing. But as we noted, Bayesian up­dat­ing re­quires that the agent start out large enough to con­sider a bunch of ways the world can be, and learn by rul­ing some of these out.

Embed­ded agents need re­source-limited, log­i­cally un­cer­tain up­dates, which don’t work like this.

Un­for­tu­nately, Bayesian up­dat­ing is the main way we know how to think about an agent pro­gress­ing through time as one unified agent. The Dutch book jus­tifi­ca­tion for Bayesian rea­son­ing is ba­si­cally say­ing this kind of up­dat­ing is the only way to not have the agent’s ac­tions on Mon­day work at cross pur­poses, at least a lit­tle, to the agent’s ac­tions on Tues­day.

Embed­ded agents are non-Bayesian. And non-Bayesian agents tend to get into wars with their fu­ture selves.

The 2nd and 4th para­graphs here are clearly false for re­flec­tive AIXI. And the 2nd para­graph im­plies that em­bed­ded agents are defi­ni­tion­ally re­source-limited. There is a true and im­por­tant sense in which re­flec­tive AIXI can be “em­bed­ded”—that was the point of com­ing up with it! -- but the Embed­ded Agency se­quence seems to be ex­clud­ing this kind of case when it talks about em­bed­ded agents. This strikes me as some­thing I’d like to see clar­ified by the au­thors of the se­quence, ac­tu­ally.

I think the differ­ence may be that we talk about “a the­ory of ra­tio­nal­ity for em­bed­ded agents,” we could mean “a the­ory that has con­se­quences for agents equally pow­er­ful to it,” or we could mean some­thing more like “a the­ory that has con­se­quences for agents of ar­bi­trar­ily low power.” Reflec­tive AIXI (as a the­ory of ra­tio­nal­ity) ex­plains why re­flec­tive AIXI (as an agent) is op­ti­mally de­signed, but it can’t ex­plain why a real-world robot might or might not be op­ti­mally de­signed.

• My ar­gu­ment isn’t spe­cial­ized to AIXI — note that I also used LIA as an ex­am­ple, which has a weaker R along with a weaker S.

Like­wise, if you put AIXI in a world whose parts can do un­com­putable things (like AIXI), you have the same pat­tern one level up. Your S is stronger, with un­compt­able strate­gies, but by the same to­ken, you lose AIXI’s op­ti­mal­ity. It’s only search­ing over com­putable strate­gies, and you have to look at all strate­gies (in­clud­ing the un­com­putable ones) to make sure you’re op­ti­mal. This leads to a rule R dis­tinct from AIXI, just as AIXI is dis­tinct from a Tur­ing ma­chine.

I guess it’s con­ceiv­able that this hits a fixed point at this level or some higher level? That would be ab­stractly in­ter­est­ing but not very rele­vant to em­bed­ded­ness in the kind of world I think I in­habit.

• OTOH, do­ing a min­i­max search of the game tree for some bounded num­ber of moves, then ap­ply­ing a sim­ple board-eval­u­a­tion heuris­tic at the leaf nodes, is a pretty de­cent al­gorithm in prac­tice.

I’ve writ­ten pre­vi­ously about this kind of ar­gu­ment—see here (scroll down to the non-block­quoted text). tl;dr we can of­ten de­scribe the same op­ti­mum in mul­ti­ple ways, with each way giv­ing us a differ­ent se­ries that ap­prox­i­mates the op­ti­mum in the limit. Whether any one se­ries does well or poorly when trun­cated to N terms can’t be ex­plained by say­ing “it’s a trun­ca­tion of the op­ti­mum,” since they all are; these trun­ca­tions prop­er­ties are facts about the differ­ent se­ries, not about the op­ti­mum. I illus­trate with differ­ent se­ries ex­pan­sions for .

Fur­ther­more, it seems like there’s a pat­tern where, the more gen­eral the al­gorith­mic prob­lem you want to solve is, the more your solu­tion is com­pel­led to re­sem­ble some sort of brute-force search.

You may be right, and there are in­ter­est­ing con­ver­sa­tions to be had about when solu­tions will tend to look like search and when they won’t. But this doesn’t feel like it re­ally ad­dresses my ar­gu­ment, which is not about “what kind of al­gorithm should you use” but about the weird­ness of the in­junc­tion to op­ti­mize over a space con­tain­ing ev­ery pro­ce­dure you could ever do, in­clud­ing all of the op­ti­miza­tion pro­ce­dures you could ever do. There is a log­i­cal /​ defi­ni­tional weird­ness here that can’t be re­solved by ar­gu­ments about what sorts of (log­i­cally /​ defi­ni­tion­ally un­prob­le­matic) al­gorithms are good or bad in what do­mains.

# When does ra­tio­nal­ity-as-search have non­triv­ial im­pli­ca­tions?

4 Nov 2018 22:42 UTC
64 points
• This post feels quite similar to things I have writ­ten in the past to jus­tify my lack of en­thu­si­asm about ideal­iza­tions like AIXI and log­i­cally-om­ni­scient Bayes. But I would go fur­ther: I think that grap­pling with em­bed­ded­ness prop­erly will in­evitably make the­o­ries of this gen­eral type ir­rele­vant or use­less, so that “a the­ory like this, ex­cept for em­bed­ded agents” is not a thing that we can rea­son­ably want. To spec­ify what I mean, I’ll use this para­graph as a jump­ing-off point:

Embed­ded agents don’t have the lux­ury of step­ping out­side of the uni­verse to think about how to think. What we would like would be a the­ory of ra­tio­nal be­lief for situ­ated agents which pro­vides foun­da­tions that are similarly as strong as the foun­da­tions Bayesi­anism pro­vides for du­al­is­tic agents.

Most “the­o­ries of ra­tio­nal be­lief” I have en­coun­tered—in­clud­ing Bayesi­anism in the sense I think is meant here—are framed at the level of an eval­u­a­tor out­side the uni­verse, and have es­sen­tially no con­tent when we try to trans­fer them to in­di­vi­d­ual em­bed­ded agents. This is be­cause these the­o­ries tend to be de­rived in the fol­low­ing way:

• We want a the­ory of the best pos­si­ble be­hav­ior for agents.

• We have some class of “prac­ti­cally achiev­able” strate­gies , which can ac­tu­ally be im­ple­mented by agents. We note that an agent’s ob­ser­va­tions provide some in­for­ma­tion about the qual­ity of differ­ent strate­gies . So if it were pos­si­ble to fol­low a rule like “find the best given your ob­ser­va­tions, and then fol­low that ,” this rule would spit out very good agent be­hav­ior.

• Usu­ally we soften this to a perfor­mance-weighted av­er­age rather than a hard argmax, but the prin­ci­ple is the same: if we could search over all of , the rule that says “do the search and then fol­low what it says” can be com­pet­i­tive with the very best . (Triv­ially so, since it has ac­cess to the best strate­gies, along with all the oth­ers.)

• But usu­ally . That is, the strat­egy “search over all prac­ti­cal strate­gies and fol­low the best ones” is not a prac­ti­cal strat­egy. But we ar­gue that this is fine, since we are con­struct­ing a the­ory of ideal be­hav­ior. It doesn’t have to be prac­ti­cally im­ple­mentable.

For ex­am­ple, in Solomonoff, is defined by com­putabil­ity while is al­lowed to be un­com­putable. In the LIA con­struc­tion, is defined by poly­time com­plex­ity while is al­lowed to run slower than poly­time. In log­i­cally-om­ni­scient Bayes, finite sets of hy­pothe­ses can be ma­nipu­lated in a finite uni­verse but the full Boolean alge­bra over hy­pothe­ses gen­er­ally can­not.

I hope the frame­work I’ve just in­tro­duced helps clar­ify what I find un­promis­ing about these the­o­ries. By con­struc­tion, any agent you can ac­tu­ally de­sign and run is a sin­gle el­e­ment of (a “prac­ti­cal strat­egy”), so ev­ery fact about ra­tio­nal­ity that can be in­cor­po­rated into agent de­sign gets “hid­den in­side” the in­di­vi­d­ual , and the only things you can learn from the “ideal the­ory” are things which can’t fit into a prac­ti­cal strat­egy.

For ex­am­ple, sup­pose (rea­son­ably) that model av­er­ag­ing and com­plex­ity penalties are broadly good ideas that lead to good re­sults. But all of the model av­er­ag­ing and com­plex­ity pe­nal­iza­tion that can be done com­putably hap­pens in­side some Tur­ing ma­chine or other, at the level “be­low” Solomonoff. Thus Solomonoff only tells you about the ex­tra ad­van­tage you can get by do­ing these things un­com­putably. Any kind of nice Bayesian av­er­age over Tur­ing ma­chines that can hap­pen com­putably is (of course) just an­other Tur­ing ma­chine.

This also ex­plains why I find it mis­lead­ing to say that good prac­ti­cal strate­gies con­sti­tute “ap­prox­i­ma­tions to” an ideal the­ory of this type. Of course, since just says to fol­low the best strate­gies in , if you are fol­low­ing a very good strat­egy in your be­hav­ior will tend to be close to that of . But this can­not be at­tributed to any of the search­ing over that does, since you are not do­ing a search over ; you are ex­e­cut­ing a sin­gle mem­ber of and ig­nor­ing the oth­ers. Any search­ing that can be done prac­ti­cally col­lapses down to a sin­gle prac­ti­cal strat­egy, and any that doesn’t is not prac­ti­cal. Con­cretely, this talk of ap­prox­i­ma­tions is like say­ing that a very suc­cess­ful chess player “ap­prox­i­mates” the rule “con­sult all pos­si­ble chess play­ers, then weight their moves by past perfor­mance.” Yes, the skil­led player will play similarly to this rule, but they are not fol­low­ing it, not even ap­prox­i­mately! They are only them­selves, not any other player.

Any the­ory of ideal ra­tio­nal­ity that wants to be a guide for em­bed­ded agents will have to be con­strained in the same ways the agents are. But the­o­ries of ideal ra­tio­nal­ity usu­ally get all of their con­tent by go­ing to a level above the agents they judge. So this new the­ory would have to be a very differ­ent sort of thing.

• This prior isn’t trol­lable in the origi­nal sense, but it is trol­lable in a weaker sense that still strikes me as im­por­tant. Since must sum to 1, only finitely many sen­tences can have for a given . So we can choose some finite set of “im­por­tant sen­tences” and con­trol their os­cilla­tions in a prac­ti­cal sense, but if there’s any such that we think os­cilla­tions across the range are a bad thing, all but finitely many sen­tences can ex­hibit this bad be­hav­ior.

It seems es­pe­cially bad that we can only pre­vent “up-to- trol­ling” for finite sets of sen­tences, since in PA (or what­ever) there are plenty of countable sets of sen­tences that seem “es­sen­tially the same” (like the ones you get from an in­duc­tion ar­gu­ment), and it feels very un­nat­u­ral to choose finite sub­sets of these and dis­t­in­guish them from the oth­ers, even (or es­pe­cially?) if we pre­tend we have no prior knowl­edge be­yond the ax­ioms.

• To quote Abram Dem­ski in “All Math­e­mat­i­ci­ans are Trol­lable”:

The main con­cern is not so much whether GLS-co­her­ent math­e­mat­i­ci­ans are trol­lable as whether they are trol­ling them­selves. Vuln­er­a­bil­ity to an ex­ter­nal agent is some­what con­cern­ing, but the ex­is­tence of mis­lead­ing proof-or­der­ings brings up the ques­tion: are there prin­ci­ples we need to fol­low when de­cid­ing what proofs to look at next, to avoid mis­lead­ing our­selves?

My con­cern is not with the dan­gers of an ac­tual ad­ver­sary, it’s with the wild os­cilla­tions and ex­treme con­fi­dences that can arise even when log­i­cal facts ar­rive in a “fair” way, so long as it is still pos­si­ble to get un­lucky and ex­pe­rience a “clump” of suc­ces­sive ob­ser­va­tions that push P(A) way up or down.

We should ex­pect such clumps some­times un­less the ob­ser­va­tion or­der is some­how spe­cially cho­sen to dis­cour­age them, say via the kind of “prin­ci­ples” Dem­ski won­ders about.

One can also pre­vent ob­ser­va­tion or­der from mat­ter­ing by do­ing what the Eisen­stat prior does: adopt an ob­ser­va­tion model that does not treat log­i­cal ob­ser­va­tions as com­ing from some fixed un­der­ly­ing re­al­ity (so that learn­ing “B or ~A” rules out some ways A could have been true), but as con­sis­tency-con­strained sam­ples from a fixed dis­tri­bu­tion. This works as far as it goes, but is hard to rec­on­cile with com­mon in­tu­itions about how e.g. P=NP is un­likely be­cause so many “ways it could have been true” have failed (Scott Aaron­son has a post about this some­where, ar­gu­ing against Lu­bos Motl who seems to think like the Eisen­stat prior), and more gen­er­ally with any kind of math­e­mat­i­cal in­tu­ition — or with the sim­ple fact that the im­pli­ca­tions of ax­ioms are fixed in ad­vance and not de­ter­mined dy­nam­i­cally as we ob­serve them. More­over, I don’t know of any way to (ap­prox­i­mately) ap­ply this model in real-world de­ci­sions, al­though maybe some­one will come up with one.

This is all to say that I don’t think there is (yet) any stan­dard Bayesian an­swer to the prob­lem of self-trol­la­bil­ity. It’s a se­ri­ous prob­lem and one at the very edge of cur­rent un­der­stand­ing, with only some par­tial stabs at solu­tions available.

• Ah, yeah, you’re right that it’s pos­si­ble to do this. I’m used to think­ing in the Kol­mogorov pic­ture, and keep for­get­ting that in the Jay­ne­sian propo­si­tional logic pic­ture you can treat ma­te­rial con­di­tion­als as con­tin­gent facts. In fact, I went through the pro­cess of re­al­iz­ing this in a similar ar­gu­ment about the same post a while ago, and then for­got about it in the mean­time!

That said, I am not sure what this pro­ce­dure has to recom­mend it, be­sides that it is pos­si­ble and (tech­ni­cally) Bayesian. The start­ing prior, with in­de­pen­dence, does not re­ally re­flect our state of knowl­edge at any time, even at the time be­fore we have “no­ticed” the im­pli­ca­tion(s). For, if we ac­tu­ally write down that prior, we have an en­try in ev­ery cell of the truth table, and if we in­spect each of those cells and think “do I re­ally be­lieve this?“, we can­not an­swer the ques­tion with­out ask­ing whether we know facts such as A ⇒ B—at which point we no­tice the im­pli­ca­tion!

It seems more ac­cu­rate to say that, be­fore we con­sider the con­nec­tion of A to B, those cells are “not even filled in.” The in­de­pen­dence prior is not some­how log­i­cally ag­nos­tic; it as­signs a spe­cific prob­a­bil­ity to the con­di­tional, just as our pos­te­rior does, ex­cept that in the prior that prob­a­bil­ity is, wrongly, not one.

Okay, one might say, but can’t this still be a good enough place to start, al­low­ing us to con­verge even­tu­ally? I’m ac­tu­ally un­sure about this, be­cause (see be­low) the log­i­cal up­dates tend to push the prob­a­bil­ities of the “ends” of a log­i­cal chain fur­ther to­wards 0 and 1; at any finite time the dis­tri­bu­tion obeys Cromwell’s Rule, but whether it con­verges to the truth might de­pend on the way in which we take the limit over log­i­cal and em­piri­cal up­dates (sup­pos­ing we do ar­bi­trar­ily many of each type as time goes on).

I got cu­ri­ous about this and wrote some code to do these up­dates with ar­bi­trary num­bers of vari­ables and ar­bi­trary con­di­tion­als. What I found is that as we con­sider longer chains A ⇒ B ⇒ C ⇒ …, the propo­si­tions at one end get pushed to 1 or 0, and we don’t need very long chains for this to get ex­treme. With all start­ing prob­a­bil­ities set to 0.7 and three vari­ables 0 ⇒ 1 ⇒ 2, the prob­a­bil­ity of vari­able 2 is 0.95; with five vari­ables the prob­a­bil­ity of the last one is 0.99 (see the plot be­low). With ten vari­ables, the last one reaches 0.99988. We can eas­ily come up with long chains in the Cal­ifor­nia ex­am­ple or similar, and fol­low­ing this pro­ce­dure would lead us to ab­surdly ex­treme con­fi­dence in such ex­am­ples.

I’ve also given a sec­ond plot be­low, where all the start­ing prob­a­bil­ities are 0.5. This shows that the grow­ing con­fi­dence does not rely on an ini­tial hunch one way or the other; sim­ply up­dat­ing on the log­i­cal re­la­tion­ships from ini­tial neu­tral­ity (plus in­de­pen­dences) pushes us to high con­fi­dence about the ends of the chain.

• Two com­ments:

1. You seem to be sug­gest­ing that the stan­dard Bayesian frame­work han­dles log­i­cal un­cer­tainty as a spe­cial case. (Here we are not ex­actly “un­cer­tain” about sen­tences, but we have to up­date on their truth from some prior that did not ac­count for it, which amounts to the same thing.) If this were true, the re­search on han­dling log­i­cal un­cer­tainty through new crite­ria and con­struc­tions would be su­perflu­ous. I haven’t ac­tu­ally seen a pro­posal like this laid out in de­tail, but I think they’ve been pro­posed and found want­ing, so I’ll be skep­ti­cal at least un­til I’m shown the de­tails of such a pro­posal.

(In par­tic­u­lar, this would need to in­volve some no­tion of con­di­tional prob­a­bil­ities like P(A | A ⇒ B), and per­haps pri­ors like P(A ⇒ B), which are not a part of any treat­ment of Bayes I’ve seen.)

2. Even if this sort of thing does work in prin­ci­ple, it doesn’t seem to help in the prac­ti­cal case at hand. We’re now told to up­date on “notic­ing” A ⇒ B by us­ing ob­jects like P(A | A ⇒ B), but these too have to be guessed us­ing heuris­tics (we don’t have a map of them ei­ther), so it in­her­its the same prob­lem it was in­tro­duced to solve.

• You as­sume a crea­ture that can’t see all log­i­cal con­se­quences of hy­pothe­ses [...] Then you make it re­al­ize new facts about log­i­cal con­se­quences of hypotheses

This is not quite what is go­ing on in sec­tion 7b. The agent isn’t learn­ing any new log­i­cal in­for­ma­tion. For in­stance, in jadagul’s “US in 2100″ ex­am­ple, all of the log­i­cal facts in­volved are things the agent already knows. ” ‘Cal­ifor­nia is a US state in 2100’ im­plies ‘The US ex­ists in 2100’ ” is not a new fact, it’s some­thing we already knew be­fore run­ning through the ex­er­cise.

My ar­gu­ment in 7b is not re­ally about up­dat­ing—it’s about whether prob­a­bil­ities can ad­e­quately cap­ture the agent’s knowl­edge, even at a sin­gle time.

This is in a con­text (typ­i­cal of real de­ci­sions) where:

• the agent knows a huge num­ber of log­i­cal facts, be­cause it can cor­rectly in­ter­pret hy­pothe­ses writ­ten in a log­i­cally trans­par­ent way, like “A and B,” and be­cause it knows lots of things about sub­sets in the world (like US /​ Cal­ifor­nia)

• but, the agent doesn’t have the time/​mem­ory to write down a “map” of ev­ery hy­poth­e­sis con­nected by these facts (like a sigma-alge­bra). For ex­am­ple, you can read an ar­bi­trary string of hy­pothe­ses “A and B and C and …” and know that this im­plies “A”, “A and C”, etc., but you don’t have in your mind a gi­ant table con­tain­ing ev­ery such con­struc­tion.

So the agent can’t as­sign cre­dences/​prob­a­bil­ities si­mul­ta­neously to ev­ery hy­poth­e­sis on that map. In­stead, they have some sort of “cre­dence gen­er­a­tor” that can take in a hy­poth­e­sis and out­put how plau­si­ble it seems, us­ing heuris­tics. In their raw form, these out­puts may not be real num­bers (they will have an or­der, but may not have e.g. a met­ric).

If we want to use Bayes here, we need to turn these raw cre­dences into prob­a­bil­ities. But re­mem­ber, the agent knows a lot of log­i­cal facts, and via the prob­a­bil­ity ax­ioms, these all trans­late to facts re­lat­ing prob­a­bil­ities to one an­other. There may not be any map­ping from raw cre­dence-gen­er­a­tor-out­put to prob­a­bil­ities that pre­serves all of these facts, and so the agent’s prob­a­bil­ities will not be con­sis­tent.

To be more con­crete about the “cre­dence gen­er­a­tor”: I find that when I am asked to pro­duce sub­jec­tive prob­a­bil­ities, I am trans­lat­ing them from in­ter­nal rep­re­sen­ta­tions like

• Event A feels “very likely”

• Event B, which is not log­i­cally en­tailed by A or vice versa, feels “pretty likely”

• Event (A and B) feels “pretty likely”

If we de­mand that these map one-to-one to prob­a­bil­ities in any nat­u­ral way, this is in­con­sis­tent. But I don’t think it’s in­con­sis­tent in it­self; it just re­flects that my heuris­tics have limited re­s­olu­tion. There isn’t a con­junc­tion fal­lacy here be­cause I’m not treat­ing these rep­re­sen­ta­tions as prob­a­bil­ities—but if I de­cide to do so, then I will have a con­junc­tion fal­lacy! If I no­tice this hap­pen­ing, I can “plug the leak” by chang­ing the prob­a­bil­ities, but I will ex­pect to keep see­ing new leaks, since I know so many log­i­cal facts, and thus there are so many con­se­quences of the prob­a­bil­ity ax­ioms that can fail to hold. And be­cause I ex­pect this to hap­pen go­ing for­ward, I am skep­ti­cal now that my re­ported prob­a­bil­ities re­flect my ac­tual be­liefs—not even ap­prox­i­mately, since I ex­pect to keep de­riv­ing very wrong things like an event be­ing im­pos­si­ble in­stead of likely.

None of this is meant dis­ap­prove of us­ing prob­a­bil­ity es­ti­mates to, say, make more grounded es­ti­mates of cost/​benefit in real-world de­ci­sions. I do find that use­ful, but I think it is use­ful for a non-Bayesian rea­son: even if you don’t de­mand a uni­ver­sal map­ping from raw cre­dences, you can get a lot of value out of say­ing things like “this de­ci­sion isn’t worth it un­less you think P(A) > 97%“, and then do­ing a one-time map­ping of that back onto a raw cre­dence, and this has a lot of prag­matic value even if you know the map­pings will break down if you push them too hard.

• If I un­der­stand your ob­jec­tion cor­rectly, it’s one I tried to an­swer already in my post.

In short: Bayesi­anism is nor­ma­tive for prob­lems to you can ac­tu­ally state in its for­mal­ism. This can be used as an ar­gu­ment for at least try­ing to state prob­lems in its for­mal­ism, and I do think this is of­ten a good idea; many of the ex­am­ples in Jaynes’ book show the value of do­ing this. But when the in­for­ma­tion you have ac­tu­ally does not fit the re­quire­ments of the for­mal­ism, you can only use it if you get more in­for­ma­tion (costly, some­times im­pos­si­ble) or for­get some of what you know to make the rest fit. I don’t think Bayes nor­ma­tively tells you to do those kinds of things, or at least that would re­quire a type of ar­gu­ment differ­ent from the usual Dutch Books etc.

Us­ing the word “brain” there was prob­a­bly a mis­take. This is only about brains in­so­far as it’s about the knowl­edge ac­tu­ally available to you in some situ­a­tion, and the same idea ap­plies to the knowl­edge available to some robot you are build­ing, or some agent in a hy­po­thet­i­cal de­ci­sion prob­lem (so long as it is a prob­lem with the same prop­erty, of not fit­ting well into the for­mal­ism with­out ex­tra work or for­get­ting).

• I don’t dis­agree with any of this. But if I un­der­stand cor­rectly, you’re only ar­gu­ing against a very strong claim—some­thing like “Bayes-re­lated re­sults can­not pos­si­bly have gen­eral rele­vance for real de­ci­sions, even via ‘in­di­rect’ paths that don’t rely on view­ing the real de­ci­sions in a Bayesian way.”

I don’t en­dorse that claim, and would find it very hard to ar­gue for. I can imag­ine vir­tu­ally any math­e­mat­i­cal re­sult play­ing some use­ful role in some hy­po­thet­i­cal frame­work for real de­ci­sions (al­though I would be more sur­prised in some cases than oth­ers), and I can’t see why Bayesian stuff should be less promis­ing in that re­gard than any ar­bi­trar­ily cho­sen piece of math. But “Bayes might be rele­vant, just like p-adic anal­y­sis might be rele­vant!” seems like damn­ing with faint praise, given the more “di­rect” am­bi­tions of Bayes as ad­vo­cated by Jaynes and oth­ers.

Is there a spe­cific “in­di­rect” path for the rele­vance of Bayes that you have in mind here?

• I dis­agree that this an­swers my crit­i­cisms. In par­tic­u­lar, my sec­tion 7 ar­gues that it’s prac­ti­cally un­fea­si­ble to even write down most prac­ti­cal be­lief /​ de­ci­sion prob­lems in the form that the Bayesian laws re­quire, so “were the laws fol­lowed?” is gen­er­ally not even a well-defined ques­tion.

To be a bit more pre­cise, the frame­work with a com­plete hy­poth­e­sis space is a bad model for the prob­lems of in­ter­est. As I de­tailed in sec­tion 7, that frame­work as­sumes that our knowl­edge of hy­pothe­ses and the log­i­cal re­la­tions be­tween hy­pothe­ses are speci­fied “at the same time,” i.e. when we know about a hy­poth­e­sis we also know all its log­i­cal re­la­tions to all other hy­pothe­ses, and when we know (im­plic­itly) about a log­i­cal re­la­tion we also have ac­cess (ex­plic­itly) to the hy­pothe­ses it re­lates. Not only is this false in many prac­ti­cal cases, I don’t even know of any for­mal­ism that would al­low us to call it “ap­prox­i­mately true,” or “true enough for the op­ti­mal­ity the­o­rems to carry over.”

(N.B. as it hap­pens, I don’t think log­i­cal in­duc­tors fix this prob­lem. But the very ex­is­tence of log­i­cal in­duc­tion as a re­search area shows that this is a prob­lem. Either we care about the con­se­quences of lack­ing log­i­cal om­ni­science, or we don’t—and ap­par­ently we do.)

It’s sort of like quot­ing an op­ti­mal­ity re­sult given ac­cess to some or­a­cle, when talk­ing about a prob­lem with­out ac­cess to that or­a­cle. If the pre­con­di­tions of a the­o­rem are not met by the defi­ni­tion of a given de­ci­sion prob­lem, “meet those pre­con­di­tions” can­not be part of a strat­egy for that prob­lem. “Solve a differ­ent prob­lem so you can use my the­o­rem” is not a solu­tion to the prob­lem as stated.

Im­por­tantly, this is not just an is­sue of “we can’t do perfect Bayes in prac­tice, but if we were able, it’d be bet­ter.” Ob­tain­ing the kind of knowl­edge rep­re­sen­ta­tion as­sumed by the Bayesian laws has com­pu­ta­tional /​ re­source costs, and in any real de­ci­sion prob­lem, we want to min­i­mize these. If we’re handed the “right” knowl­edge rep­re­sen­ta­tion by a ge­nie, fine, but if we are talk­ing about choos­ing to gen­er­ate it, that in it­self is a de­ci­sion with costs.

As a side point, I am also skep­ti­cal of some of the op­ti­mal­ity re­sults.

• I agree. When I think about the “math­e­mat­i­cian mind­set” I think largely about the over­whelming in­ter­est in the pres­ence or ab­sence, in some space of in­ter­est, of “patholog­i­cal” en­tities like the Weier­strass func­tion. The truth or false­hood of “for all /​ there ex­ists” state­ments tend to turn on these patholo­gies or their ab­sence.

How does this re­late to op­ti­miza­tion? Op­ti­miza­tion can make patholog­i­cal en­tities more rele­vant, if

(1) they hap­pen to be op­ti­mal solu­tions, or

(2) an al­gorithm that ig­nores them will be, for that rea­son, in­se­cure /​ ex­ploitable.

But this is not a gen­eral ar­gu­ment about op­ti­miza­tion, it’s a con­tin­gent claim that is only true for some prob­lems of in­ter­est, and in a way that de­pends on the de­tails of those prob­lems.

And one can make a sep­a­rate ar­gu­ment that, when con­di­tions like 1-2 do not hold, a fo­cus on patholog­i­cal cases is un­helpful: if a state­ment “fails in prac­tice but works in the­ory” (say by hold­ing ex­cept on a set of suffi­ciently small mea­sure as to always be dom­i­nated by other con­tri­bu­tions to a de­ci­sion prob­lem, or only for de­ci­sions that would be ruled out any­way for some other rea­son, or over the finite range rele­vant for some calcu­la­tion but not in the long or short limit), op­ti­miza­tion will ex­ploit its “effec­tive truth” whether or not you have no­ticed it. And state­ments about “effec­tive truth” tend to be math­e­mat­i­cally pretty un­in­ter­est­ing; try get­ting an au­di­ence of math­e­mat­i­ci­ans to care about a deriva­tion that rocket en­g­ineers can af­ford to ig­nore grav­i­ta­tional waves, for ex­am­ple.

• I think the ar­gu­ments here ap­ply much bet­ter to the AGI al­ign­ment case than to the case of HPMOR. The struc­ture of the post sug­gests (? not sure) that HPMOR is meant to be the “eas­ier” case, the one in which the reader will as­sent to the ar­gu­ments more read­ily, but it didn’t work that way on me.

In both cases, we have some sort of met­ric for what it would mean to suc­ceed, and (per­haps com­pet­ing) in­side- and out­side-view ar­gu­ments for how highly we should ex­pect to score on that met­ric. (More pre­cisely, what prob­a­bil­ities we should as­sign to achiev­ing differ­ent scores.) In both cases, this post tends to dis­miss facts which in­volve so­cial sta­tus as ir­rele­vant to the out­side view.

But what if our suc­cess met­ric de­pends on some facts which in­volve so­cial sta­tus? Then we definitely shouldn’t ig­nore these facts, (even) in the in­side view. And this is the situ­a­tion we are in with HPMOR, at least, if per­haps less so with AGI al­ign­ment.

There are some suc­cess met­rics for HPMOR men­tioned in this post which can be eval­u­ated largely with­out refer­ence to sta­tus stuff (like “has it con­veyed the ex­pe­rience of be­ing ra­tio­nal to many peo­ple?“). But when spe­cific suc­cesses—known to have been achieved in the ac­tual world—come up, many of them are clearly re­lated to sta­tus. If you want to know whether your fic will be­come one of the most re­viewed HP fan­fics on a fan­fic­tion site, then it mat­ters how it will be re­ceived by the sorts of peo­ple who re­view HP fan­fics on those sites—in­clud­ing their sta­tus hi­er­ar­chies. (Of course, this will be less im­por­tant if we ex­pect most of the re­view-posters to be peo­ple who don’t read HP fan­fic nor­mally and have found out about the story through an­other chan­nel, but its im­por­tance is always nonzero, and very much so for some hy­po­thet­i­cal sce­nar­ios.)

TBH, I don’t un­der­stand why so much of this post fo­cuses on pure pop­u­lar­ity met­rics for HPMOR, ones that don’t cap­ture whether it is hav­ing the in­tended effect on read­ers. (Even some­thing like “many read­ers con­sider it the best book they’ve ever read” does not tell you much with­out spec­i­fy­ing more about the read­er­ship; con­sider that if you were op­ti­miz­ing for this met­ric, you would have an in­cen­tive to se­lect for read­ers who have read as few books as pos­si­ble.)

I guess the idea may be that it is pos­si­ble to sur­prise some­one like Pat by hit­ting a mea­surable in­dic­tor of high sta­tus (be­cause Pat thinks that’s too much of a sta­tus leap rel­a­tive to the start­ing po­si­tion), where Pat would be less sur­prised by HPMOR hit­ting idiosyn­cratic goals that are not com­mon in HP fan­fic­tion (and thus are not high sta­tus to him). But this pat­tern of sur­prise lev­els seems ob­vi­ously cor­rect to me! If you are try­ing to pre­dict an in­di­ca­tor of sta­tus in a com­mu­nity, you should use in­for­ma­tion about the sta­tus sys­tem in that com­mu­nity in your in­side view. (And like­wise, if the in­di­ca­tor is un­re­lated to sta­tus, you may be able to ig­nore sta­tus in­for­ma­tion.)

In short, this post con­demns us­ing sta­tus-re­lated facts for fore­cast­ing, even when they are rele­vant (be­cause we are fore­cast­ing other sta­tus-re­lated facts). I don’t mean the next state­ment as Bul­verism, but as a hope­fully use­ful hy­poth­e­sis: it seems pos­si­ble that the con­cept of sta­tus reg­u­la­tion has en­couraged this con­fu­sion, by cre­at­ing a pat­tern to match to (“ar­gu­ment in­volv­ing sta­tus and the ex­ist­ing state of a field, to the effect that I shouldn’t ex­pect to be ca­pa­ble of some­thing”), even when some ar­gu­ments match­ing that pat­tern are good ar­gu­ments.

• I just wrote a long post on my tum­blr about this se­quence, which I am cross-post­ing here as a com­ment on the fi­nal post. (N.B. my tone is harsher and less con­ver­sa­tional than it would have been if I had thought of it as a com­ment while writ­ing.)

I fi­nally got around to read­ing these posts. I wasn’t im­pressed with them.

The ba­sic gist is some­thing like:

“There are well-es­tab­lished game-the­o­retic rea­sons why so­cial sys­tems (gov­ern­ments, academia, so­ciety as a whole, etc.) may not find, or not im­ple­ment, good ideas even when they are easy to find/​im­ple­ment and the ex­pected benefits are great. There­fore, it is some­times war­ranted to be­lieve you’ve come up with a good, work­able idea which ‘ex­perts’ or ‘so­ciety’ have not found/​im­ple­mented yet. You should think about the game-the­o­retic rea­sons why this might or might not be pos­si­ble, on a case-by-case ba­sis; gen­er­al­ized max­ims about ‘how much you should trust the ex­perts’ and the like are coun­ter­pro­duc­tive.”

I agree with this, al­though it also seems fairly ob­vi­ous to me. It’s pos­si­ble that Yud­kowsky is re­ally pin­point­ing a trend (to­ward an ex­treme “mod­est episte­mol­ogy”) that sounds ob­vi­ously wrong once it’s pinned down, but is nonethe­less per­va­sive; if so, I guess it’s good to ar­gue against it, al­though I haven’t en­coun­tered it my­self.

But the biggest rea­son I was not im­pressed is that Yud­kowsky mostly ig­nores an which strikes me as cru­cial. He makes a case that, given some hy­po­thet­i­cally good idea, there are rea­sons why ex­perts/​so­ciety might not find and im­ple­ment it. But as in­di­vi­d­u­als, what we see are not ideas known to be good.

What we see are ideas that look good, ac­cord­ing to the mod­els and ar­gu­ments we have right now. There is some cost (in time, money, etc.) as­so­ci­ated with test­ing each of these ideas. Even if there are many un­tried good ideas, it might still be the case that these are a van­ish­ing small frac­tion of ideas that look good be­fore they are tested. In that case, the ex­pected value of “be­ing an ex­per­i­menter” (i.e. test­ing lots of good-look­ing ideas) could eas­ily be nega­tive, even though there are many truly good, untested ideas.

To me, this seems like the big de­ter­min­ing fac­tor for whether in­di­vi­d­u­als can ex­pect to reg­u­larly find and ex­ploit low-hang­ing fruit.

The clos­est Yud­kowsky comes to ad­dress­ing this topic is in sec­tions 4-5 of the post “Liv­ing in an Inad­e­quate World.” There, he’s talk­ing about the idea that even if many things are sub­op­ti­mal, you should still ex­pect a low base rate of ex­ploitable sub­op­ti­mal­ities in any ar­bi­trar­ily/​ran­domly cho­sen area. He analo­gizes this to find­ing ex­ploits in com­puter code:

Com­puter se­cu­rity pro­fes­sion­als don’t at­tack sys­tems by pick­ing one par­tic­u­lar func­tion and say­ing, “Now I shall find a way to ex­ploit these ex­act 20 lines of code!” Most lines of code in a sys­tem don’t provide ex­ploits no mat­ter how hard you look at them. In a large enough sys­tem, there are rare lines of code that are ex­cep­tions to this gen­eral rule, and some­times you can be the first to find them. But if we think about a ran­dom sec­tion of code, the base rate of ex­ploita­bil­ity is ex­tremely low—ex­cept in re­ally, re­ally bad code that no­body looked at from a se­cu­rity stand­point in the first place.
Think­ing that you’ve searched a large sys­tem and found one new ex­ploit is one thing. Think­ing that you can ex­ploit ar­bi­trary lines of code is quite an­other.

This isn’t re­ally the same is­sue I’m talk­ing about – in the terms of this anal­ogy, my ques­tion is “when you think you have found an ex­ploit, but you can’t costlessly test it, how con­fi­dent should you be that there is re­ally an ex­ploit?”

But he goes on to say some­thing that seems rele­vant to my con­cern, namely that most of the time you think you have found an ex­ploit, you won’t be able to use­fully act on it:

Similarly, you do not gen­er­ate a good startup idea by tak­ing some ran­dom ac­tivity, and then talk­ing your­self into be­liev­ing you can do it bet­ter than ex­ist­ing com­pa­nies. Even where the cur­rent way of do­ing things seems bad, and even when you re­ally do know a bet­ter way, 99 times out of 100 you will not be able to make money by know­ing bet­ter. If some­body else makes money on a solu­tion to that par­tic­u­lar prob­lem, they’ll do it us­ing rare re­sources or skills that you don’t have—in­clud­ing the skill of be­ing su­per-charis­matic and get­ting tons of ven­ture cap­i­tal to do it.
To be­lieve you have a good startup idea is to say, “Un­like the typ­i­cal 99 cases, in this par­tic­u­lar anoma­lous and un­usual case, I think I can make a profit by know­ing a bet­ter way.”
The anomaly doesn’t have to be some su­per-un­usual skill pos­sessed by you alone in all the world. That would be a ques­tion that always re­turned “No,” a blind set of gog­gles. Hav­ing an un­usu­ally good idea might work well enough to be worth try­ing, if you think you can stan­dardly solve the other stan­dard startup prob­lems. I’m merely em­pha­siz­ing that to find a rare startup idea that is ex­ploitable in dol­lars, you will have to scan and keep scan­ning, not pur­sue the first “X is bro­ken and maybe I can fix it!” thought that pops into your head.
To win, choose winnable bat­tles; await the rare anoma­lous case of, “Oh wait, that could work.”

The prob­lem with this is that many peo­ple already in­clude “pick your bat­tles” as part of their pro­ce­dure for de­ter­min­ing whether an idea seems good. Peo­ple are more con­fi­dent in their new ideas in ar­eas where they have com­par­a­tive ad­van­tages, and in ar­eas where ex­ist­ing work is es­pe­cially bad, and in ar­eas where they know they can han­dle the im­ple­men­ta­tion de­tails (“the other stan­dard startup prob­lems,” in EY’s ex­am­ple).

Let’s grant that all of that is already part of the calcu­lus that re­sults in peo­ple singling out cer­tain ideas as “look­ing good” – which seems clearly true, al­though doubtlessly many peo­ple could do bet­ter in this re­spect. We still have no idea what frac­tion of good-look­ing ideas are ac­tu­ally good.

Or rather, I have some ideas on the topic, and I’m sure Yud­kowsky does too, but he does not provide any ar­gu­ments to sway any­one who is pes­simistic on this is­sue. Since op­ti­mism vs. pes­simism on this is­sue strikes me as the one big ques­tion about low-hang­ing fruit, this leaves me feel­ing that the topic of low-hang­ing fruit has not re­ally been ad­dressed.

Yud­kowsky men­tions some ex­am­ples of his own at­tempts to act upon good-seem­ing ideas. To his credit, he men­tions a failure (his ke­to­genic meal re­place­ment drink recipe) as well as a suc­cess (string­ing up 130 light bulbs around the house to treat his wife’s Sea­sonal Affec­tive Di­sor­der). Nei­ther of these were costless ex­per­i­ments. He speci­fi­cally men­tions the mon­e­tary cost of test­ing the light bulb hy­poth­e­sis:

The sys­tem­atic com­pe­tence of hu­man civ­i­liza­tion with re­spect to treat­ing mood di­s­or­ders wasn’t so ap­par­ent to me that I con­sid­ered it a bet­ter use of re­sources to quietly drop the is­sue than to just lay down the ~$600 needed to test my sus­pi­cion. His wife has very bad SAD, and the only other treat­ment that worked for her cost a lot more than this. Given that the hy­poth­e­sis worked, it was clearly a great in­vest­ment. But not all hy­pothe­ses work. So be­fore I do the test, how am I to know whether it’s worth$600? What if the cost is greater than that, or the ex­pected benefit less? What does the right de­ci­sion-mak­ing pro­cess look like, quan­ti­ta­tively?

Yud­kowsky’s an­swer is that you can tell when good ideas in an area are likely to have been over­looked by an­a­lyz­ing the “ad­e­quacy” of the so­cial struc­tures that gen­er­ate, test, and im­ple­ment ideas. But this is only one part of the puz­zle. At best, it tells us P(so­ciety hasn’t done it yet | it’s good). But what we need is P(it’s good | so­ciety hasn’t done it yet). And to get to one from the other, we need the prior prob­a­bil­ity of “it’s good,” as a func­tion of the do­main, my own abil­ities, and so forth. How can we know this? What if there are do­mains where so­ciety is in­ad­e­quate yet good ideas are truly rare, and do­mains where so­ciety is fairly ad­e­quate but good ideas as so plen­tiful as to dom­i­nate the calcu­la­tion?

In an ear­lier con­ver­sa­tion about low-hang­ing fruit, tum­blr user @ar­gu­mate brought up the pos­si­bil­ity that low-hang­ing fruit are ba­si­cally im­pos­si­ble to find be­fore­hand, but that so­ciety finds them by fund­ing many differ­ent at­tempts and col­lect­ing on the rare suc­cesses. That is, ev­ery in­di­vi­d­ual at­tempt to pluck fruit is EV-nega­tive given risk aver­sion, but a port­fo­lio of such at­tempts (such as a ven­ture cap­i­tal­ist’s port­fo­lio) can be net-pos­i­tive given risk aver­sion, be­cause with many at­tempts the prob­a­bil­ity of one big suc­cess that pays for the rest (a “uni­corn”) goes up. It seems to me like this is plau­si­ble.

Let me end on a pos­i­tive note, though. Even if the pre­vi­ous para­graph is ac­cu­rate, it is a good thing for so­ciety if more in­di­vi­d­u­als en­gage in ex­per­i­men­ta­tion (al­though it is a net nega­tive for each of those in­di­vi­d­u­als). Be­cause of this, the in­di­vi­d­ual’s choice to ex­per­i­ment can still be jus­tified on other terms – as a sort of al­tru­is­tic ex­pen­di­ture, say, or as a way of kindling hope in the face of per­sonal mal­adies like SAD (in which case it is like a more proso­cial ver­sion of gam­bling).

Cer­tainly there is some­thing emo­tion­ally and aes­thet­i­cally ap­peal­ing about a re­sur­gence of cit­i­zen sci­ence – about or­di­nary peo­ple look­ing at the bro­ken, p-hacked, per­verse-in­cen­tived ed­ifice of Big Science and say­ing “em­piri­cism is im­por­tant, dammit, and if The Ex­perts won’t do it, we will.” (There is prece­dent for this, and not just as a rich man’s game – there is a great chap­ter in The In­tel­lec­tual Life of the Bri­tish Work­ing Classes about wide­spread cit­i­zen sci­ence efforts in the 19th C work­ing class.) I am pes­simistic about whether my ex­per­i­ments, or yours, will bear fruit of­ten enough to make the in­di­vi­d­ual cost-benefit anal­y­sis work out, but that does not mean they should not be done. In­deed, per­haps they should.

• Thanks—this is in­for­ma­tive and I think it will be use­ful for any­one try­ing to de­cide what to make of your pro­ject.

I have dis­agree­ments about the “in­di­vi­d­ual woman” ex­am­ple but I’m not sure it’s worth hash­ing it out, since it gets into some thorny stuff about per­sua­sion/​rhetoric that I’m sure we both have strong opinions on.

Re­gard­ing MIRI, I want to note that al­though the or­ga­ni­za­tion has cer­tainly be­come more com­pe­tently man­aged, the more re­cent OpenPhil re­view in­cluded some very in­ter­est­ing and pointed crit­i­cism of the tech­ni­cal work, which I’m not sure enough peo­ple saw, as it was hid­den in a sup­ple­men­tal PDF. Clearly this is not the place to hash out those tech­ni­cal is­sues, but they are worth not­ing, since the re­viewer ob­jec­tions were more “these re­sults do not move you to­ward you stated goal in this pa­per” than “your stated goal is pointless or quix­otic,” so if true they are iden­ti­fy­ing a ra­tio­nal­ity failure.

• This seems like a very good per­spec­tive to me.

It made me think about the way that clas­sic bi­ases are of­ten ex­plained by con­struct­ing money pumps. A money pump is taken to be a clear, knock-down demon­stra­tion of ir­ra­tional­ity, since “clearly” no one would want to lose ar­bi­trar­ily large amounts of money. But in fact any money pump could be ra­tio­nal if the agent just en­joyed mak­ing the choices in­volved. If I greatly en­joyed an­chor­ing on num­bers pre­sented to me, I might well pay a lot of ex­tra money to get an­chored; this would be like buy­ing a kind of en­joy­able product. Like­wise some­one might just get a kick out of mak­ing choices in in­tran­si­tive loops, or hy­per­bolic dis­count­ing, or what­ever. (In the re­verse di­rec­tion, if you didn’t know I en­joyed some con­sumer good, you might think I was get­ting “money pumped” by pay­ing for it again and again.)

So there is a miss­ing step here, and to sup­ply the step we need psy­chol­ogy. The rea­son these bi­ases are bi­ases and not val­ues is “those aren’t the sort of things we care about,” but to for­mal­ize that, we need an ac­count of “the sort of things we care about” which, as you say, can’t be solved for from policy data alone.