“Rationalist Discourse” Is Like “Physicist Motors”

Imagine being a student of physics, and coming across a blog post proposing a list of guidelines for “physicist motors”—motor designs informed by the knowledge of physicists, unlike ordinary motors.

Even if most of the things on the list seemed like sensible advice to keep in mind when designing a motor, the framing would seem very odd. The laws of physics describe how energy can be converted into work. To the extent that any motor accomplishes anything, it happens within the laws of physics. There are theoretical ideals describing how motors need to work in principle, like the Carnot engine, but you can’t actually build an ideal Carnot engine; real-world electric motors or diesel motors or jet engines all have their own idiosyncratic lore depending on the application and the materials at hand; an engineer who worked on one, might not the be best person to work on another. You might appeal to principles of physics to explain why some particular motor is inefficient or poorly-designed, but you would not speak of physicist motors as if that were a distinct category of thing—and if someone did, you might quietly begin to doubt how much they really knew about physics.

As a student of rationality, I feel the same way about guidelines for “rationalist discourse.” The laws of probability and decision theory describe how information can be converted into optimization power. To the extent that any discourse accomplishes anything, it happens within the laws of rationality.

Rob Bensinger proposes “Elements of Rationalist Discourse” as a companion to Duncan Sabien’s earlier “Basics of Rationalist Discourse”. Most of the things on both lists are, indeed, sensible advice that one might do well to keep in mind when arguing with people, but as Bensinger notes, “Probably this new version also won’t match ‘the basics’ as other people perceive them.”

But there’s a reason for that: a list of guidelines has the wrong type signature for being “the basics”. The actual basics are the principles of rationality one would appeal to explain which guidelines are a good idea: principles like how evidence is the systematic correlation between possible states of your observations and possible states of reality, how you need evidence to locate the correct hypothesis in the space of possibilities, how the quality of your conclusion can only be improved by arguments that have the power to change that conclusion.

Contemplating these basics, it should be clear that there’s just not going to be anything like a unique style of “rationalist discourse”, any more than there is a unique “physicist motor.” There are theoretical ideals describing how discourse needs to work in principle, like Bayesian reasoners with common priors exchanging probability estimates, but you can’t actually build an ideal Bayesian reasoner. Rather, different discourse algorithms (the collective analogue of “cognitive algorithm”) leverage the laws of rationality to convert information into optimization in somewhat different ways, depending on the application and the population of interlocutors at hand, much as electric motors and jet engines both leverage the laws of physics to convert energy into work without being identical to each other, and with each requiring their own engineering sub-specialty to design.

Or to use another classic metaphor, there’s also just not going to be a unique martial art. Boxing and karate and ju-jitsu all have their own idiosyncratic lore adapted to different combat circumstances, and a master of one would easily defeat a novice of the other. One might appeal to the laws of physics and the properties of the human body to explain why some particular martial arts school was not teaching their students to fight effectively. But if some particular karate master were to brand their own lessons as the “basics” or “elements” of “martialist fighting”, you might quietly begin to doubt how much actual fighting they had done: either all fighting is “martialist” fighting, or “martialist” fighting isn’t actually necessary for beating someone up.

One historically important form of discourse algorithm is debate, and its close variant the adversarial court system. It works by separating interlocutors into two groups: one that searches for arguments in favor of a belief, and another that searches for arguments against the belief. Then anyone listening to the debate can consider all the arguments to help them decide whether or not to adopt the belief. (In the court variant of debate, a designated “judge” or “jury” announces a “verdict” for or against the belief, which is added to the court’s shared map, where it can be referred to in subsequent debates, or “cases.”)

The enduring success and legacy of the debate algorithm can be attributed to how it circumvents a critical design flaw in individual human reasoning, the tendency to “rationalize”—to preferentially search for new arguments for an already-determined conclusion.

(At least, “design flaw” is one way of looking at it—a more complete discussion would consider how individual human reasoning capabilities co-evolved with the debate algorithm—and, as I’ll briefly discuss later, this “bug” for the purposes of reasoning is actually a “feature” for the purposes of deception.)

As a consequence of rationalization, once a conclusion has been reached, even prematurely, further invocations of the biased argument-search process are likely to further entrench the conclusion, even when strong counterarguments exist (in regions of argument-space neglected by the biased search). The debate algorithm solves this sticky-conclusion bug by distributing a search for arguments and counterarguments among multiple humans, ironing out falsehoods by pitting two biased search processes against each other. (For readers more familiar with artificial than human intelligence, generative adversarial networks work on a similar principle.)

For all its successes, the debate algorithm also suffers from many glaring flaws. For one example, the benefits of improved conclusions mostly accrue to third parties who haven’t already entrenched on a conclusion; debate participants themselves are rarely seen changing their minds. For another, just the choice of what position to debate has a distortionary effect even on the audience; if it takes more bits to locate a hypothesis for consideration than to convincingly confirm or refute it, then most of the relevant cognition has already happened by the time people are arguing for or against it. Debate is also inefficient: for example, if the “defense” in the court variant happens to find evidence or arguments that would benefit the “prosecution”, the defense has no incentive to report it to the court, and there’s no guarantee that the prosecution will independently find it themselves.

Really, the whole idea is so galaxy-brained that it’s amazing it works at all. There’s only one reality, so correct information-processing should result in everyone agreeing on the best, most-informed belief-state. This is formalized in Aumann’s famous agreement theorem, but even without studying the proofs, the result is obvious. A generalization to a more realistic setting without instantaneous communication gives the result that disagreements should be unpredictable: after Bob the Bayesian tells Carol the Coherent Reasoner his belief, Bob’s expectation of the difference between his belief and Carol’s new belief should be zero. (That is, Carol might still disagree, but Bob shouldn’t be able to predict whether it’s in the same direction as before, or whether Carol now holds a more extreme position on what adherents to the debate algorithm would call “Bob’s side.”)

That being the normative math, why does the human world’s enduringly dominant discourse algorithm take for granted the ubiquity of, not just disagreements, but predictable disagreements? Isn’t that crazy?

Yes. It is crazy. One might hope to do better by developing some sort of training or discipline that would allow discussions between practitioners of such “rational arts” to depart from the harnessed insanity of the debate algorithm with its stubbornly stable “sides”, and instead mirror the side-less Bayesian ideal, the free flow of all available evidence channeling interlocutors to an unknown destination.

Back in late ’aughts, an attempt to articulate what such a discipline might look like was published on a blog called Overcoming Bias. (You probably haven’t heard of it.) It’s been well over a decade since then. How is that going?

Eliezer Yudkowsky laments:

In the end, a lot of what people got out of all that writing I did, was not the deep object-level principles I was trying to point to—they did not really get Bayesianism as thermodynamics, say, they did not become able to see Bayesian structures any time somebody sees a thing and changes their belief. What they got instead was something much more meta and general, a vague spirit of how to reason and argue, because that was what they’d spent a lot of time being exposed to over and over and over again in lots of blog posts.

“A vague spirit of how to reason and argue” seems like an apt description of what “Basics of Rationalist Discourse” and “Elements of Rationalist Discourse” are attempting to codify—but with no explicit instruction on which guidelines arise from deep object-level principles of normative reasoning, and which from mere taste, politeness, or adaptation to local circumstances, it’s unclear whether students of 2020s-era “rationalism” are poised to significantly outperform the traditional debate algorithm—and it seems alarmingly possible to do worse, if the collaborative aspects of modern “rationalist” discourse allow participants to introduce errors that a designated adversary under the debate algorithm would have been incentivized to correct, and most “rationalist” practitioners don’t have a deep theoretical understanding of why debate works as well as it does.

Looking at Bensinger’s “Elements”, there’s a clear-enough connection between the first eight points (plus three sub-points) and the laws of normative reasoning. Truth-Seeking, Non-Deception, and Reality-Minding, trivial. Non-Violence, because violence doesn’t distinguish between truth and falsehood. Localizability, in that I can affirm the validity of an argument that A would imply B, while simultaneously denying A. Alternative-Minding, because decisionmaking under uncertainty requires living in many possible worlds. And so on. (Lawful justifications for the elements of Reducibility and Purpose-Minding left as an exercise to the reader.)

But then we get this:

  1. Goodwill. Reward others’ good epistemic conduct (e.g., updating) more than most people naturally do. Err on the side of carrots over sticks, forgiveness over punishment, and civility over incivility, unless someone has explicitly set aside a weirder or more rough-and-tumble space.

I can believe that these are good ideas for having a pleasant conversation. But separately from whether “Err on the side of forgiveness over punishment” is a good idea, it’s hard to see how it belongs on the same list as things like “Try not to ‘win’ arguments using [...] tools that work similarly well whether you’re right or wrong” and “[A]sk yourself what Bayesian evidence you have that you’re not in those alternative worlds”.

The difference is this. If your discourse algorithm lets people “win” arguments with tools that work equally well whether they’re right or wrong, then your discourse gets the wrong answer (unless, by coincidence, the people who are best at winning are also the best at getting the right answer). If the interlocutors in your discourse don’t ask themselves what Bayesian evidence they have that they’re not in alternative worlds, then your discourse gets the wrong answer (if you happen to live in an alternative world).

If your discourse algorithm errs on the side of sticks over carrots (perhaps, emphasizing punishing others’ bad epistemic conduct more than most people naturally do), then … what? How, specifically, are rough-and-tumble spaces less “rational”, more prone to getting the wrong answer, such that a list of “Elements of Rationalist Discourse” has the authority to designate them as non-default?

I’m not saying that goodwill is bad, particularly. I totally believe that goodwill is a necessary part of many discourse algorithms that produce maps that reflect the territory, much like how kicking is a necessary part of many martial arts (but not boxing). It just seems like a bizarre thing to put in a list of guidelines for “rationalist discourse”.

It’s as if guidelines for designing “physicist motors” had a point saying, “Use more pistons than most engineers naturally do.” It’s not that pistons are bad, particularly. Lots of engine designs use pistons! It’s just, the pistons are there specifically to convert force from expanding gas into rotational motion. I’m pretty pessimistic about the value of attempts to teach junior engineers to mimic the surface features of successful engines without teaching them how engines work, even if the former seems easier.

The example given for “[r]eward[ing] others’ good epistemic conduct” is “updating”. If your list of “Elements of Rationalist Discourse” is just trying to apply a toolbox of directional nudges to improve the median political discussion on social media (where everyone is yelling and no one is thinking), then sure, directionally nudging people to directionally nudge people to look like they’re updating probably is a directional improvement. It still seems awfully unambitious, compared to trying to teach the criteria by which we can tell it’s an improvement. In some contexts (in-person interactions with someone I like or respect), I think I have the opposite problem, of being disposed to agree with the person I’m currently talking to, in a way that shortcuts the slow work of grappling with their arguments and doesn’t stick after I’m not talking to them anymore; I look as if I’m “updating”, but I haven’t actually learned. Someone who thought “rationalist discourse” entailed “[r]eward[ing] others’ good epistemic conduct (e.g., updating) more than most people naturally do” and sought to act on me accordingly would be making that problem worse.

A footnote on the “Goodwill” element elaborates:

Note that this doesn’t require assuming everyone you talk to is honest or has good intentions.

It does have some overlap with the rule of thumb “as a very strong but defeasible default, carry on object-level discourse as if you were role-playing being on the same side as the people who disagree with you”.

But this seems to contradict the element of Non-Deception. If you’re not actually on the same side as the people who disagree with you, why would you (as a very strong but defeasible default) role-play otherwise?

Other intellectual communities have a name for the behavior of role-playing being on the same side as people you disagree with: they call it “concern trolling”, and they think it’s a bad thing. Why is that? Are they just less rational than “us”, the “rationalists”?

Here’s what I think is going on. There’s another aspect to the historical dominance of the debate algorithm. The tendency to rationalize new arguments for a fixed conclusion is only a bug if one’s goal is to improve the conclusion. If the fixed conclusion was adopted for other reasons—notably, because one would benefit from other people believing it—then generating new arguments might help persuade those others. If persuading others is the real goal, then rationalization is not irrational; it’s just dishonest. (And if one’s concept of “honesty” is limited to not consciously making false statements, it might not even be dishonest.) Society benefits from using the debate algorithm to improve shared maps, but most individual debaters are mostly focused on getting their preferred beliefs onto the shared map.

That’s why people don’t like concern trolls. If my faction is trying to get Society to adopt beliefs that benefit our faction onto the shared map, someone who comes to us role-playing being on our side, but who is actually trying to stop us from adding our beliefs to the shared map just because they think our beliefs don’t reflect the territory, isn’t a friend; they’re a double agent, an enemy pretending to be a friend, which is worse than the honest enemy we expect to face before the judge in the debate hall.

This vision of factions warring to make Society’s shared map benefit themselves is pretty bleak. It’s tempting to think the whole mess could be fixed by starting a new faction—the “rationalists”—that is solely dedicated to making Society’s shared map reflect the territory: a culture of clear thinking, clear communication, and collaborative truth-seeking.

I don’t think it’s that simple. You do have interests, and if you can fool yourself into thinking that you don’t, your competitors are unlikely to fall for it. Even if your claim to only want Society’s shared map to reflect the territory were true—which it isn’t—anyone could just say that.

I don’t immediately have solutions on hand. Just an intuition that, if there is any way of fixing this mess, it’s going to involve clarifying conflicts rather than obfuscating them—looking for Pareto improvements, rather than pretending that everyone has the same utility function. That if something called “rationalism” is to have any value whatsoever, it’s as the field of study that can do things like explain why it makes sense that people don’t like concern trolling. Not as its own faction with its own weird internal social norms that call for concern trolling as a very strong but defeasible default.

But don’t take my word for it.