Don’t Double-Crux With Suicide Rock

Hon­est ra­tio­nal agents should never agree to dis­agree.

This idea is for­mal­ized in Au­mann’s agree­ment the­o­rem and its var­i­ous ex­ten­sions (we can’t fore­see to dis­agree, un­com­mon pri­ors re­quire ori­gin dis­putes, com­plex­ity bounds, &c.), but even with­out the so­phis­ti­cated math­e­mat­ics, a ba­sic in­tu­ition should be clear: there’s only one re­al­ity. Beliefs are for map­ping re­al­ity, so if we’re ask­ing the same ques­tion and we’re do­ing ev­ery­thing right, we should get the same an­swer. Cru­cially, even if we haven’t seen the same ev­i­dence, the very fact that you be­lieve some­thing is it­self ev­i­dence that I should take into ac­count—and you should think the same way about my be­liefs.

In “The Coin Guess­ing Game”, Hal Fin­ney gives a toy model illus­trat­ing what the pro­cess of con­ver­gence looks like in the con­text of a sim­ple game about in­fer­ring the re­sult of a coin­flip. A coin is flipped, and two play­ers get a “hint” about the re­sult (Heads or Tails) along with an as­so­ci­ated hint “qual­ity” uniformly dis­tributed be­tween 0 and 1. Hints of qual­ity 1 always match the ac­tual re­sult; hints of qual­ity 0 are use­less and might as well be an­other coin­flip. Sev­eral “rounds” com­mence where play­ers si­mul­ta­neously re­veal their cur­rent guess of the coin­flip, in­cor­po­rat­ing both their own hint and its qual­ity, and what they can in­fer about the other player’s hint qual­ity from their be­hav­ior in pre­vi­ous rounds. Even­tu­ally, agree­ment is reached. The pro­cess is some­what alien from a hu­man per­spec­tive (when’s the last time you and an in­ter­locu­tor switched sides in a de­bate mul­ti­ple times be­fore even­tu­ally agree­ing?!), but not com­pletely so: if some­one whose ra­tio­nal­ity you trusted seemed visi­bly un­moved by your strongest ar­gu­ments, you would in­fer that they had strong ev­i­dence or coun­ter­ar­gu­ments of their own, even if there was some rea­son they couldn’t tell you what they knew.

Hon­est ra­tio­nal agents should never agree to dis­agree.

In “Disagree With Suicide Rock”, Robin Han­son dis­cusses a sce­nario where dis­agree­ment seems clearly jus­tified: if you en­counter a rock with words painted on it claiming that you, per­son­ally, should com­mit suicide ac­cord­ing to your own val­ues, you should feel com­fortable dis­agree­ing with the words on the rock with­out fear of be­ing in vi­o­la­tion of the Au­mann the­o­rem. The rock is prob­a­bly just a rock. The words are in­for­ma­tion from who­ever painted them, and maybe that per­son did some­how know some­thing about whether fu­ture ob­servers of the rock should com­mit suicide, but the rock it­self doesn’t im­ple­ment the dy­namic of re­spond­ing to new ev­i­dence.

In par­tic­u­lar, if you find your­self play­ing Fin­ney’s coin guess­ing game against a rock with the let­ter “H” painted on it, you should just go with your own hint: it would be in­cor­rect to rea­son, “Wow, the rock is still say­ing Heads, even af­ter ob­serv­ing my be­lief in sev­eral pre­vi­ous rounds; its hint qual­ity must have been very high.”

Hon­est ra­tio­nal agents should never agree to dis­agree.

Hu­man so-called “ra­tio­nal­ists” who are aware of this may im­plic­itly or ex­plic­itly seek agree­ment with their peers. If some­one whose ra­tio­nal­ity you trusted seemed visi­bly un­moved by your strongest ar­gu­ments, you might think, “Hm, we still don’t agree; I should up­date to­wards their po­si­tion …”

But an­other pos­si­bil­ity is that your trust has been mis­placed. Hu­mans suffer­ing from “al­gorith­mic bad faith” are on a con­tinuum with Suicide Rock. What mat­ters is the coun­ter­fac­tual de­pen­dence of their be­liefs on states of the world, not whether they know all the right key­words (“crux” and “char­i­ta­ble” seem to be pop­u­lar these days), nor whether they can perform the be­hav­ior of “mak­ing ar­gu­ments”—and definitely not their sub­jec­tive con­scious ver­bal nar­ra­tives.

And if the so-called “ra­tio­nal­ists” around you suffer from cor­re­lated al­gorith­mic bad faith—if you find your­self liv­ing in a world of painted rocks—then it may come to pass that pro­tect­ing the sanc­tity of your map re­quires you to mas­ter the tech­nique of lonely dis­sent.