Don’t Double-Crux With Suicide Rock

Honest rational agents should never agree to disagree.

This idea is formalized in Aumann’s agreement theorem and its various extensions (we can’t foresee to disagree, uncommon priors require origin disputes, complexity bounds, &c.), but even without the sophisticated mathematics, a basic intuition should be clear: there’s only one reality. Beliefs are for mapping reality, so if we’re asking the same question and we’re doing everything right, we should get the same answer. Crucially, even if we haven’t seen the same evidence, the very fact that you believe something is itself evidence that I should take into account—and you should think the same way about my beliefs.

In “The Coin Guessing Game”, Hal Finney gives a toy model illustrating what the process of convergence looks like in the context of a simple game about inferring the result of a coinflip. A coin is flipped, and two players get a “hint” about the result (Heads or Tails) along with an associated hint “quality” uniformly distributed between 0 and 1. Hints of quality 1 always match the actual result; hints of quality 0 are useless and might as well be another coinflip. Several “rounds” commence where players simultaneously reveal their current guess of the coinflip, incorporating both their own hint and its quality, and what they can infer about the other player’s hint quality from their behavior in previous rounds. Eventually, agreement is reached. The process is somewhat alien from a human perspective (when’s the last time you and an interlocutor switched sides in a debate multiple times before eventually agreeing?!), but not completely so: if someone whose rationality you trusted seemed visibly unmoved by your strongest arguments, you would infer that they had strong evidence or counterarguments of their own, even if there was some reason they couldn’t tell you what they knew.

Honest rational agents should never agree to disagree.

In “Disagree With Suicide Rock”, Robin Hanson discusses a scenario where disagreement seems clearly justified: if you encounter a rock with words painted on it claiming that you, personally, should commit suicide according to your own values, you should feel comfortable disagreeing with the words on the rock without fear of being in violation of the Aumann theorem. The rock is probably just a rock. The words are information from whoever painted them, and maybe that person did somehow know something about whether future observers of the rock should commit suicide, but the rock itself doesn’t implement the dynamic of responding to new evidence.

In particular, if you find yourself playing Finney’s coin guessing game against a rock with the letter “H” painted on it, you should just go with your own hint: it would be incorrect to reason, “Wow, the rock is still saying Heads, even after observing my belief in several previous rounds; its hint quality must have been very high.”

Honest rational agents should never agree to disagree.

Human so-called “rationalists” who are aware of this may implicitly or explicitly seek agreement with their peers. If someone whose rationality you trusted seemed visibly unmoved by your strongest arguments, you might think, “Hm, we still don’t agree; I should update towards their position …”

But another possibility is that your trust has been misplaced. Humans suffering from “algorithmic bad faith” are on a continuum with Suicide Rock. What matters is the counterfactual dependence of their beliefs on states of the world, not whether they know all the right keywords (“crux” and “charitable” seem to be popular these days), nor whether they can perform the behavior of “making arguments”—and definitely not their subjective conscious verbal narratives.

And if the so-called “rationalists” around you suffer from correlated algorithmic bad faith—if you find yourself living in a world of painted rocks—then it may come to pass that protecting the sanctity of your map requires you to master the technique of lonely dissent.