Don’t Double-Crux With Suicide Rock

Hon­est ra­tio­nal agents should never agree to dis­agree.

This idea is for­mal­ized in Au­mann’s agree­ment the­o­rem and its var­i­ous ex­ten­sions (we can’t fore­see to dis­agree, un­com­mon pri­ors re­quire ori­gin dis­putes, com­plex­ity bounds, &c.), but even with­out the so­phis­ti­cated math­e­mat­ics, a ba­sic in­tu­ition should be clear: there’s only one re­al­ity. Beliefs are for map­ping re­al­ity, so if we’re ask­ing the same ques­tion and we’re do­ing ev­ery­thing right, we should get the same an­swer. Cru­cially, even if we haven’t seen the same ev­i­dence, the very fact that you be­lieve some­thing is it­self ev­i­dence that I should take into ac­count—and you should think the same way about my be­liefs.

In “The Coin Guess­ing Game”, Hal Fin­ney gives a toy model illus­trat­ing what the pro­cess of con­ver­gence looks like in the con­text of a sim­ple game about in­fer­ring the re­sult of a coin­flip. A coin is flipped, and two play­ers get a “hint” about the re­sult (Heads or Tails) along with an as­so­ci­ated hint “qual­ity” uniformly dis­tributed be­tween 0 and 1. Hints of qual­ity 1 always match the ac­tual re­sult; hints of qual­ity 0 are use­less and might as well be an­other coin­flip. Sev­eral “rounds” com­mence where play­ers si­mul­ta­neously re­veal their cur­rent guess of the coin­flip, in­cor­po­rat­ing both their own hint and its qual­ity, and what they can in­fer about the other player’s hint qual­ity from their be­hav­ior in pre­vi­ous rounds. Even­tu­ally, agree­ment is reached. The pro­cess is some­what alien from a hu­man per­spec­tive (when’s the last time you and an in­ter­locu­tor switched sides in a de­bate mul­ti­ple times be­fore even­tu­ally agree­ing?!), but not com­pletely so: if some­one whose ra­tio­nal­ity you trusted seemed visi­bly un­moved by your strongest ar­gu­ments, you would in­fer that they had strong ev­i­dence or coun­ter­ar­gu­ments of their own, even if there was some rea­son they couldn’t tell you what they knew.

Hon­est ra­tio­nal agents should never agree to dis­agree.

In “Disagree With Suicide Rock”, Robin Han­son dis­cusses a sce­nario where dis­agree­ment seems clearly jus­tified: if you en­counter a rock with words painted on it claiming that you, per­son­ally, should com­mit suicide ac­cord­ing to your own val­ues, you should feel com­fortable dis­agree­ing with the words on the rock with­out fear of be­ing in vi­o­la­tion of the Au­mann the­o­rem. The rock is prob­a­bly just a rock. The words are in­for­ma­tion from who­ever painted them, and maybe that per­son did some­how know some­thing about whether fu­ture ob­servers of the rock should com­mit suicide, but the rock it­self doesn’t im­ple­ment the dy­namic of re­spond­ing to new ev­i­dence.

In par­tic­u­lar, if you find your­self play­ing Fin­ney’s coin guess­ing game against a rock with the let­ter “H” painted on it, you should just go with your own hint: it would be in­cor­rect to rea­son, “Wow, the rock is still say­ing Heads, even af­ter ob­serv­ing my be­lief in sev­eral pre­vi­ous rounds; its hint qual­ity must have been very high.”

Hon­est ra­tio­nal agents should never agree to dis­agree.

Hu­man so-called “ra­tio­nal­ists” who are aware of this may im­plic­itly or ex­plic­itly seek agree­ment with their peers. If some­one whose ra­tio­nal­ity you trusted seemed visi­bly un­moved by your strongest ar­gu­ments, you might think, “Hm, we still don’t agree; I should up­date to­wards their po­si­tion …”

But an­other pos­si­bil­ity is that your trust has been mis­placed. Hu­mans suffer­ing from “al­gorith­mic bad faith” are on a con­tinuum with Suicide Rock. What mat­ters is the coun­ter­fac­tual de­pen­dence of their be­liefs on states of the world, not whether they know all the right key­words (“crux” and “char­i­ta­ble” seem to be pop­u­lar these days), nor whether they can perform the be­hav­ior of “mak­ing ar­gu­ments”—and definitely not their sub­jec­tive con­scious ver­bal nar­ra­tives.

And if the so-called “ra­tio­nal­ists” around you suffer from cor­re­lated al­gorith­mic bad faith—if you find your­self liv­ing in a world of painted rocks—then it may come to pass that pro­tect­ing the sanc­tity of your map re­quires you to mas­ter the tech­nique of lonely dis­sent.

• Pre­sum­ably dou­ble crux with Suicide Rock would re­veal that the Rock doesn’t have any cruxes, and dou­ble crux with some­one suffer­ing from al­gorith­mic bad faith would also re­veal that, tho per­haps more sub­tly?

• You are a bit too quick to al­low the reader the pre­sump­tion that they have more al­gorith­mic faith than the other folks they talk to. Yes if you are su­per ra­tio­nal and they are not, you can ig­nore them. But how did you come to be con­fi­dent in that de­scrip­tion of the situ­a­tion?

• Every­thing I’m say­ing is definitely sym­met­ric across per­sons, even if, as an au­thor, I pre­fer to phrase it in the sec­ond per­son. (A pre­vi­ous post in­cluded a clar­ify­ing par­en­thet­i­cal to this effect at the end, but this one did not.)

That is, if some­one who trusted your ra­tio­nal­ity no­ticed that you seemed visi­bly un­moved by their strongest ar­gu­ments, they might think that the lack of agree­ment im­plies that they should up­date to­wards your po­si­tion, but an­other pos­si­bil­ity is that their trust has been mis­placed! If they find them­selves liv­ing a world of painted rocks where you are one of the rocks, then it may come to pass that pro­tect­ing the sanc­tity of their map would re­quire them to mas­ter the tech­nique of lonely dis­sent.

You could ar­gue that my au­thor’s artis­tic prefer­ence to phrase things in the sec­ond per­son is mis­lead­ing, but I’m not sure what to do about that while still ac­com­plish­ing ev­ery­thing else I’m try­ing to do with my writ­ing: my re­ply to Wei Dai and a Red­dit user’s com­men­tary on an­other pre­vi­ous post seem rele­vant.

• Be­ing able to parse philo­soph­i­cal ar­gu­ments is ev­i­dence of be­ing ra­tio­nal. When you make philo­soph­i­cal ar­gu­ments, you should think of your­self as only con­vey­ing con­tent to those who are ra­tio­nally pars­ing things, and con­vey­ing only ap­pear­ance/​gloss/​style to those who aren’t ra­tio­nally pars­ing things.

• Uh, we are talk­ing about hold­ing peo­ple to MUCH higher ra­tio­nal­ity stan­dards than the abil­ity to parse Phil ar­gu­ments.

• I think be­ing smart is only very small ev­i­dence for be­ing ra­tio­nal (es­pe­cially globally ra­tio­nal, as Zach is as­sum­ing here, rather than lo­cally ra­tio­nal).

I think most of the ev­i­dence to­wards be­ing ra­tio­nal of un­der­stand­ing philo­soph­i­cal ev­i­dence is screened off by be­ing smart (which again, is a very very weak cor­re­la­tion already).

• Hon­est ra­tio­nal agents can still dis­agree if the fact that they’re all hon­est and ra­tio­nal isn’t com­mon knowl­edge.

• In prac­ti­cal terms, agree­ing to dis­agree can sim­ply mean that given re­source con­straints it isn’t worth reach­ing con­ver­gence on this topic given the delta in ex­pected pay­offs.

• I fre­quently find my­self in situ­a­tions where:

1) I dis­agree with someone

2) My opinion is based on fairly large body of un­der­stand­ing ac­cu­mu­lated over many years

3) I think I un­der­stand where the other per­son is go­ing wrong

4) try­ing to reach con­ver­gence would, in prac­tice, look like a pointless ar­gu­ment that would only piss ev­ery­one off.

If there are real con­se­quences at stake, I’ll speak up. Often I’ll have to take it offline and write a few pages, be­cause some po­si­tions too com­plex for most peo­ple to fol­low orally. But if the agree­ment isn’t worth the ar­gu­ment, I prob­a­bly won’t.

• And if the prob­lem for­mu­la­tion is much sim­pler than the solu­tion then there will be a re­cur­ring ex­plana­tory debt to be paid down as mul­ti­tudes of idiots re-en­counter the prob­lem and ig­nore ex­ist­ing solu­tions.

• This is what FAQs are for. On LW, The Se­quences are our FAQ.

• I think this is an im­por­tant con­sid­er­a­tion of bounded ra­tio­nal agents, and much more so for em­bed­ded agents, which is un­for­tu­nately of­ten ig­nored. The re­sult is that you should not ex­pect to ever meet an agent where Au­mann fully ap­plies in all cases be­cause nei­ther of you has the com­pu­ta­tional re­sources nec­es­sary to always reach agree­ment.

• Im­por­tant note about Au­mann’s agree­ment the­o­rem: both agents have to have the same pri­ors. With hu­man be­ings this isn’t always the case, es­pe­cially when it comes to val­ues. But even with perfect Bayesian rea­son­ers it isn’t always the case, since their model of the world is their prior. Two Bayesi­ans with the same data can dis­agree if they are rea­son­ing from differ­ent causal mod­els.

Now with in­finite data, aban­don­ment of poor perform­ing mod­els, and an Oc­cam prior it is much more likely that they will agree. But not math­e­mat­i­cally guaran­teed AFAIK.

It’s a good heuris­tic in prac­tice. But don’t draw strong con­clu­sions from it with­out cor­rob­o­rat­ing ev­i­dence.

• The usual for­mal­iza­tion of “Oc­cam’s prior” is the Solomonoff prior, which still de­pends on the choice of a Univer­sal Tur­ing Ma­chine, so such agents can still dis­agree be­cause of differ­ent pri­ors.

• Go one step fur­ther.

Hon­est ra­tio­nal agents should never agree to dis­agree.

There are no such agents. On many top­ics, NOBODY, in­clud­ing you and in­clud­ing me, is suffi­ciently hon­est NOR suffi­ciently ra­tio­nal for Au­mann’s the­o­rem to ap­ply.

• The other prob­lem with Au­mann’s agree­ment the­o­rem is that it’s of­ten ap­plied too broadly. It should say, “Hon­est ra­tio­nal agents should never agree to dis­agree on mat­ters of fact.” What to do about those facts is definitely up for dis­agree­ment, in­so­far as two hon­est, ra­tio­nal agents may value wildly differ­ent things.

• An ear­lier draft ac­tu­ally speci­fied ”… on ques­tions of fact”, but I deleted that phrase be­cause I didn’t think it was mak­ing the ex­po­si­tion stronger. (Omit need­less words!) Peo­ple who un­der­stand the fact/​value dis­tinc­tion, in­stru­men­tal goals, &c. usu­ally don’t have trou­ble “rel­a­tiviz­ing” policy be­liefs. (Even if I don’t want to max­i­mize pa­per­clips, I can still have a lawful dis­cus­sion about what the pa­per­clip-max­i­miz­ing thing to do would be.)

• I un­der­stand the point about omit­ting need­less words, but I think the words are needed in this case. I think there’s a dan­ger here of Au­mann’s agree­ment the­o­rem be­ing mi­sused to pro­long dis­agree­ments when those dis­agree­ments are on mat­ters of val­ues and fu­ture ac­tions rather than on the pre­sent state of the world. This is es­pe­cially true in “hot” top­ics (like poli­tics, re­li­gion, etc) where mat­ters of fact and mat­ters of value are closely in­ter­twined.

• A slightly differ­ent frame on this (I think less pes­simistic) is some­thing like “hon­esty hasn’t been in­vented yet”. Or, rather, ex­plicit knowl­edge of how to im­ple­ment hon­esty does not ex­ist in a way that can be eas­ily trans­ferred. (Tacit knowl­edge of such may ex­ist but it’s hard to val­i­date and share)

(I’m refer­ring, I think, to the same sort of hon­esty Zack is get­ting at here, al­though the as­pects of it that are rele­vant to dou­ble­crux that didn’t come up in that pre­vi­ous blog­post)

I think, ob­vi­ously, that there have been mas­sive strides (across hu­man his­tory, and yes on LW in par­tic­u­lar) in how to im­ple­ment “Ideal­ized Hon­esty” (for lack of a bet­ter term for now). So, the prob­lem seems pretty tractable. But it does not feel like a thing within spit­ting dis­tance.

• The kind of hon­esty Zack is talk­ing about is de­sir­able, but it’s un­clear whether it’s suffi­cient for Au­mann’s the­o­rem to ap­ply.

• I just want to spring off of this to point out some­thing about Au­mann’s agree­ment the­o­rem. I of­ten see it used as a kind of cud­gel be­cause peo­ple miss an im­por­tant as­pect.

It can take us hu­man be­ings time and effort to con­verge on a view.

Of­ten­times it’s just not worth it to one or more of the par­ti­ci­pants to in­vest that time and effort.

• there’s only one reality

But there’s no agree­ment about what con­sti­tutes ev­i­dence of some­thing be­ing real, so even agree­ment about fact is go­ing to be ex­tremely difficult.

• “Hon­est ra­tio­nal agents should never agree to dis­agree.”

I never re­ally looked into Au­mann’s the­o­rem. But can one not en­visage a situ­a­tion where they “agree to dis­agree”, be­cause the al­ter­na­tive is to ar­gue in­definitely?

• The ti­tle of Au­mann’s pa­per is just a pithy slo­gan. What the slo­gan means as the ti­tle of his pa­per is the ac­tual math­e­mat­i­cal re­sult that he proves. This is that if two agents have the same pri­ors, but have made differ­ent ob­ser­va­tions, then if they share only their pos­te­ri­ors, and each prop­erly up­dates on the other’s pos­te­rior, and re­peat, then they will ap­proach agree­ment with­out ever hav­ing to share the ob­ser­va­tions them­selves. In other pa­pers there are the­o­rems plac­ing prac­ti­cal bounds on the num­ber of iter­a­tions re­quired.

In ac­tual hu­man in­ter­ac­tion, there is a large num­ber of ways in which dis­agree­ments among us may fall out­side the scope of this the­o­rem. Inac­cu­racy of ob­ser­va­tion. All the im­perfec­tions of ra­tio­nal­ity that may lead us to pro­cess ob­ser­va­tions in­cor­rectly. Non-com­mon pri­ors. In­abil­ity to ar­tic­u­late nu­mer­i­cal pri­ors. In­abil­ity to ar­tic­u­late our ob­ser­va­tions in nu­mer­i­cal terms. The effort re­quired may ex­ceed our need for a re­s­olu­tion. Lack of good faith. Lack of com­mon knowl­edge of our good faith.

No­tice that these are all im­perfec­tions. The math­e­mat­i­cal ideal re­mains. How to act in ac­cor­dance with the eter­nal truths of math­e­mat­i­cal the­o­rems when we lack the means to satisfy their hy­pothe­ses is the theme of a large part of the Se­quences.

• No, the al­ter­na­tive (and only out­come for hon­est ra­tio­nal agents) is to con­verge to one be­lief. Each takes the other’s stated (and mu­tu­ally known to be hon­est and ra­tio­nal) be­liefs as ev­i­dence, on which they up­date their own.

• [1] Hon­est ra­tio­nal agents should [2] never agree to dis­agree.
[3] This idea is for­mal­ized in [4] Au­mann’s agree­ment the­o­rem

The con­di­tions [1] are suffi­cient for the con­clu­sion [2] (as shown by [4]) but are not all nec­es­sary.

Hon­esty is not re­quired.

If this is sur­pris­ing, then it might be use­ful to con­sider that ‘hav­ing com­mon pri­ors’ is kind of like be­ing able to read peo­ple’s minds—what they are think­ing will be within the space of pos­si­bil­ities you con­sider. Things such ra­tio­nal agents say to each other may be sur­pris­ing, but never un-con­ceived of; never in­con­ceiv­able. And with each (new) piece of in­for­ma­tion they ac­quire they come closer to the truth—whether the words they hear are “true” or “false”, it mat­ters not—only what ev­i­dence ‘hear­ing those words’ is. Un­der such cir­cum­stances lies may be use­less. Not be­cause ra­tio­nal agents are in­ca­pable of ly­ing, but be­cause, they pos­sess im­pos­si­ble com­pu­ta­tional abil­ities that en­sure con­ver­gence of shared be­liefs* (in their minds) af­ter they meet—a state of af­fairs which does not tell you any­thing about their words.

Events may pro­ceed in a fash­ion such that a third ob­server (that isn’t a “ra­tio­nal agent”) such as you or I, might say “they agreed to dis­agree”. Au­mann’s agree­ment the­o­rem doesn’t tell us that this will never hap­pen, only that such an ob­server would be wrong about what ac­tu­ally hap­pened to their (in­ter­nal) be­liefs, how­ever their pro­fessed (or performed be­liefs) hold oth­er­wise.

One con­se­quence of this is how such a con­ver­sa­tion might go—the ra­tio­nal agents might sim­ply state the prob­a­bil­ities they give for a propo­si­tion, rather than dis­cussing the ev­i­dence, be­cause they can as­sess the ev­i­dence from each other’s re­sponses, be­cause they already know all the ev­i­dence that ‘might be’.

*Which need not be at or be­tween where they started. Two Bayesian with differ­ent ev­i­dence that has led them to be­lieve some­thing is very un­likely, af­ter meet­ing may con­clude that it is very likely—if that is the as­sess­ment they would give had they had both pieces of in­for­ma­tion to be­gin with.

• If an agent is not hon­est, ey can de­cide to say only things that provide no ev­i­dence re­gard­ing the ques­tion in hand to the other agent. In this case con­ver­gence is not guaran­teed. For ex­am­ple, Alice as­signs prob­a­bil­ity 35% to “will it rain to­mor­row” but, when asked, says the prob­a­bil­ity is 21% re­gard­less of what the ac­tual ev­i­dence is. Bob as­signs prob­a­bil­ity 89% to “will it rain to­mor­row” but, when asked, says the prob­a­bil­ity is 42% re­gard­less of what the ac­tual ev­i­dence is. Alice knows Bob always an­swers 42%. Bob knows Alice always an­swers 21%. If they talk to each other, their prob­a­bil­ities will not con­verge (they won’t change at all).

Yes, it can luck­ily hap­pen that the lies still con­tain enough in­for­ma­tion for them to con­verge, but I’m not sure why do you seem to think it is an im­por­tant or nat­u­ral situ­a­tion?

• I don’t think the ‘ra­tio­nal agents’ in ques­tion are a good model for peo­ple, or that the the­o­ret­i­cal situ­a­tion is any­thing close to nat­u­ral. Aside from the myr­iad ways they are differ­ent*, the re­sult of ‘ra­tio­nal peo­ple’** in­ter­act­ing seems like an em­piri­cal ques­tion. Per­haps a the­ory that mod­els peo­ple bet­ter will come up with the same re­sults—and offer sug­ges­tions for how peo­ple can im­prove.

The ad­di­tion of the word “hon­est” seems like it comes from an aware­ness of how the model is flawed. I pointed out how this differs from the model, be­cause the model is some­what un­in­tu­itive, and makes rather large as­sump­tions—and it’s not clear how well the re­sult holds up as the gap be­tween those as­sump­tions and re­al­ity are re­moved.

Yes, to ask for a the­ory that en­ables con­struct­ing, or ap­prox­i­mat­ing, the agents de­scribed therein would be ask­ing for a lot, but that might clearly es­tab­lish how the the­ory re­lates to re­al­ity/​peo­ple in­ter­act­ing with each other.

*like hav­ing the abil­ity to com­pute un­com­putable things in­stantly (with no mis­takes),

**Who are com­pu­ta­tion­ally bounded, etc.

• The ad­di­tion of the word “hon­est” doesn’t come from an aware­ness of how the model is flawed. It is one of the ex­plicit as­sump­tions in the model. So, I’m still not sure what point are you go­ing for here.

I think that ap­ply­ing Au­mann’s the­o­rem to peo­ple is mostly in­ter­est­ing in the pre­scrip­tive rather than de­scrip­tive sense. That is, the the­o­rem tells us that our abil­ity to con­verge can serve as a test of our ra­tio­nal­ity, to the ex­tent that we are hon­est and share the same prior, and all of this is com­mon knowl­edge. (This last as­sump­tion might be the hard­est to make sense of. Han­son tried to jus­tify it but IMO not quite con­vinc­ingly.) Btw, you don’t need to com­pute un­com­putable things, much less in­stantly. Scott Aaron­son de­rived a ver­sion of the the­o­rem with ex­plicit com­pu­ta­tional com­plex­ity and query com­plex­ity bounds that don’t seem pro­hibitive.

Given all the difficul­ties, I am not sure how to ap­ply it in the real world and whether that’s even pos­si­ble. I do think it’s in­ter­est­ing to think about it. But, to the ex­tent it is pos­si­ble, it definitely re­quires hon­esty.