On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche

Rob Bensinger argues that “ITT-passing and civility are good; ‘charity’ is bad; steelmanning is niche”.

The ITT—Ideological Turing Test—is an exercise in which one attempts to present one’s interlocutor’s views as persuasively as the interlocutor themselves can, coined by Bryan Caplan in analogy to the Turing Test for distinguishing between humans and intelligent machines. (An AI that can pass as human must presumably possess human-like understanding; an opponent of an idea that can pass as an advocate for it presumably must possess an advocate’s understanding.) “Steelmanning” refers to the practice of addressing a stronger version of an interlocutor’s argument, coined in disanalogy to “strawmanning”, the crime of addressing a weaker version of an interlocutor’s argument in the hopes of fooling an audience (or oneself) that the original argument has been rebutted.

Bensinger describes steelmanning as “a useful niche skill”, but thinks it isn’t “a standard thing you bring out in most arguments.” Instead, he writes, discussions should be structured around object-level learning, trying to pass each other’s Ideological Turing Test, or trying resolve cruxes.

I think Bensinger has it backwards: the Ideological Turing Test is a useful niche skill, but it doesn’t belong on a list of things to organize a discussion around, whereas something like steelmanning naturally falls out of object-level learning. Let me explain.

The ITT is a test of your ability to model someone else’s models of some real-world phenomena of interest. But usually, I’m much more interested in modeling the real-world phenomena of interest directly, rather than modeling someone else’s models of it.

I couldn’t pass an ITT for advocates of Islam or extrasensory perception. On the one hand, this does represent a distinct deficit in my ability to model what the advocates of these ideas are thinking, a tragic gap in my comprehension of reality, which I would hope to remedy in the Glorious Transhumanist Future if that were a real thing. On the other hand, facing the constraints of our world, my inability to pass an ITT for Islam or ESP seems … basically fine? I already have strong reasons to doubt the existence of ontologically fundamental mental entities. I accept my ignorance of the reasons someone might postulate otherwise, not out of contempt, but because I just don’t have the time.

Or think of it this way: as a selfish seeker of truth speaking to another selfish seeker of truth, when would I want to try to pass my interlocutor’s ITT, or want my interlocutor to try to pass my ITT?

In the “outbound” direction, I’m not particularly selfishly interested in passing my interlocutor’s ITT because, again, I usually don’t care much about other people’s beliefs, as contrasted to the reality that those beliefs are reputedly supposed to track. I listen to my interlocutor hoping to learn from them, but if some part of what they say seems hopelessly wrong, it doesn’t seem profitable to pretend that it isn’t until I can reproduce the hopeless wrongness in my own words.

Crucially, the same is true in the “inbound” direction. I don’t expect people to be able to pass my ITT before criticizing my ideas. That would make it harder for people to inform me about flaws in my ideas!

But if I’m not particularly interested in passing my interlocutor’s ITT or in my interlocutor passing mine, and my interlocutor presumably (by symmetry) feels the same way, why would we bother?

All this having been said, I absolutely agree that, all else being equal, the ability to pass ITTs is desirable. It’s useful as a check that you and your interlocutor are successfully communicating, rather than talking past each other. If I couldn’t do better on an ITT for Islam or ESP after debating a proponent, that would be alarming—it’s just that I’d want to try the old-fashioned debate algorithm first, and improve my ITT score as a side-effect, rather than trying to optimize my ITT score directly.

There are occasions when I’m inclined to ask an interlocutor to pass my ITT—specifically when I suspect them of not being honest about their motives, of being selfish about something other than the pursuit of truth (like winning acclaim for “their own” current theories). If someone seems persistently motivated to strawman you, asking them to just repeat back what you said in their own words is a useful device to get the discussion back on track. (Or to end it, if they clearly don’t even want to try.)

In contrast to the ITT, steelmanning is something a selfish seeker of truth is inclined to do naturally, as a consequence of the obvious selfish practice of improving arguments wherever they happen to be found. In the outbound direction, if someone makes a flawed criticism of my ideas, of course I want to fix the flaws and address the improved argument. If the original criticism is faulty, but the repaired criticism exposes a key weakness in my existing ideas, then I learn something, which is great. If I were to just rebut the original criticism without trying to repair it, then I wouldn’t learn anything, which would be terrible.

Likewise, in the inbound direction, if my interlocutor notices a flaw in my criticism of their ideas and fixes the flaw before addressing the repaired criticism, that’s great. Why would I object?

The motivation here may be clearer if we consider the process of constructing computer programs rather than constructing arguments. When a colleague or language model assistant suggests an improvement to my code, I often accept the suggestion with my own (“steelmanned”?) changes rather than verbatim. This is so commonplace among programmers that it doesn’t even have a special name.

Bensinger quotes Eliezer Yudkowsky writing, “If you want to try to make a genuine effort to think up better arguments yourself because they might exist, don’t drag the other person into it,” but this bizarrely seems to discount the possibility of iterating on criticisms as they are posed. Despite making a genuine effort to think up better code that might exist, I often fail. If other people can see flaws in my code (because they know things I don’t) and have their own suggestions, and I can see flaws in their suggestions (because I also know things they don’t which didn’t make it into my first draft) and have my own counter-suggestions, that seems like an ideal working relationship, not a malign imposition.

All this having been said, I agree that there’s a serious potential failure mode where someone who thinks of themselves as steelmanning is actually constructing worse arguments than those that they purport to be improving. In this case, indeed, prompting such a delusional interlocutor to try the ITT first is a crucial remedy.

But crucial remedies are still niche in the sense that they shouldn’t be “a standard thing you bring out in most arguments”—or if they are, it’s a sign that you need to find better interlocutors. Having to explicitly drag out the ITT is a sign of sickness, not a sign of health. It shouldn’t be normal to have to resort to roleplaying exercises to achieve the benefits that could as well be had from basic reading comprehension and a selfish interest in accurate shared maps.

Steven Kaas wrote in 2008:

If you’re interested in being on the right side of disputes, you will refute your opponents’ arguments. But if you’re interested in producing truth, you will fix your opponents’ arguments for them.

To win, you must fight not only the creature you encounter; you must fight the most horrible thing that can be constructed from its corpse.

The ITT is a useful tool for being on the right side of disputes: in order to knowably refute your opponents’ arguments, you should be able to demonstrate that you know what those arguments are. I am nevertheless left with a sense that more is possible.