On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche
Rob Bensinger argues that “ITT-passing and civility are good; ‘charity’ is bad; steelmanning is niche”.
The ITT—Ideological Turing Test—is an exercise in which one attempts to present one’s interlocutor’s views as persuasively as the interlocutor themselves can, coined by Bryan Caplan in analogy to the Turing Test for distinguishing between humans and intelligent machines. (An AI that can pass as human must presumably possess human-like understanding; an opponent of an idea that can pass as an advocate for it presumably must possess an advocate’s understanding.) “Steelmanning” refers to the practice of addressing a stronger version of an interlocutor’s argument, coined in disanalogy to “strawmanning”, the crime of addressing a weaker version of an interlocutor’s argument in the hopes of fooling an audience (or oneself) that the original argument has been rebutted.
Bensinger describes steelmanning as “a useful niche skill”, but thinks it isn’t “a standard thing you bring out in most arguments.” Instead, he writes, discussions should be structured around object-level learning, trying to pass each other’s Ideological Turing Test, or trying resolve cruxes.
I think Bensinger has it backwards: the Ideological Turing Test is a useful niche skill, but it doesn’t belong on a list of things to organize a discussion around, whereas something like steelmanning naturally falls out of object-level learning. Let me explain.
The ITT is a test of your ability to model someone else’s models of some real-world phenomena of interest. But usually, I’m much more interested in modeling the real-world phenomena of interest directly, rather than modeling someone else’s models of it.
I couldn’t pass an ITT for advocates of Islam or extrasensory perception. On the one hand, this does represent a distinct deficit in my ability to model what the advocates of these ideas are thinking, a tragic gap in my comprehension of reality, which I would hope to remedy in the Glorious Transhumanist Future if that were a real thing. On the other hand, facing the constraints of our world, my inability to pass an ITT for Islam or ESP seems … basically fine? I already have strong reasons to doubt the existence of ontologically fundamental mental entities. I accept my ignorance of the reasons someone might postulate otherwise, not out of contempt, but because I just don’t have the time.
Or think of it this way: as a selfish seeker of truth speaking to another selfish seeker of truth, when would I want to try to pass my interlocutor’s ITT, or want my interlocutor to try to pass my ITT?
In the “outbound” direction, I’m not particularly selfishly interested in passing my interlocutor’s ITT because, again, I usually don’t care much about other people’s beliefs, as contrasted to the reality that those beliefs are reputedly supposed to track. I listen to my interlocutor hoping to learn from them, but if some part of what they say seems hopelessly wrong, it doesn’t seem profitable to pretend that it isn’t until I can reproduce the hopeless wrongness in my own words.
Crucially, the same is true in the “inbound” direction. I don’t expect people to be able to pass my ITT before criticizing my ideas. That would make it harder for people to inform me about flaws in my ideas!
But if I’m not particularly interested in passing my interlocutor’s ITT or in my interlocutor passing mine, and my interlocutor presumably (by symmetry) feels the same way, why would we bother?
All this having been said, I absolutely agree that, all else being equal, the ability to pass ITTs is desirable. It’s useful as a check that you and your interlocutor are successfully communicating, rather than talking past each other. If I couldn’t do better on an ITT for Islam or ESP after debating a proponent, that would be alarming—it’s just that I’d want to try the old-fashioned debate algorithm first, and improve my ITT score as a side-effect, rather than trying to optimize my ITT score directly.
There are occasions when I’m inclined to ask an interlocutor to pass my ITT—specifically when I suspect them of not being honest about their motives, of being selfish about something other than the pursuit of truth (like winning acclaim for “their own” current theories). If someone seems persistently motivated to strawman you, asking them to just repeat back what you said in their own words is a useful device to get the discussion back on track. (Or to end it, if they clearly don’t even want to try.)
In contrast to the ITT, steelmanning is something a selfish seeker of truth is inclined to do naturally, as a consequence of the obvious selfish practice of improving arguments wherever they happen to be found. In the outbound direction, if someone makes a flawed criticism of my ideas, of course I want to fix the flaws and address the improved argument. If the original criticism is faulty, but the repaired criticism exposes a key weakness in my existing ideas, then I learn something, which is great. If I were to just rebut the original criticism without trying to repair it, then I wouldn’t learn anything, which would be terrible.
Likewise, in the inbound direction, if my interlocutor notices a flaw in my criticism of their ideas and fixes the flaw before addressing the repaired criticism, that’s great. Why would I object?
The motivation here may be clearer if we consider the process of constructing computer programs rather than constructing arguments. When a colleague or language model assistant suggests an improvement to my code, I often accept the suggestion with my own (“steelmanned”?) changes rather than verbatim. This is so commonplace among programmers that it doesn’t even have a special name.
Bensinger quotes Eliezer Yudkowsky writing, “If you want to try to make a genuine effort to think up better arguments yourself because they might exist, don’t drag the other person into it,” but this bizarrely seems to discount the possibility of iterating on criticisms as they are posed. Despite making a genuine effort to think up better code that might exist, I often fail. If other people can see flaws in my code (because they know things I don’t) and have their own suggestions, and I can see flaws in their suggestions (because I also know things they don’t which didn’t make it into my first draft) and have my own counter-suggestions, that seems like an ideal working relationship, not a malign imposition.
All this having been said, I agree that there’s a serious potential failure mode where someone who thinks of themselves as steelmanning is actually constructing worse arguments than those that they purport to be improving. In this case, indeed, prompting such a delusional interlocutor to try the ITT first is a crucial remedy.
But crucial remedies are still niche in the sense that they shouldn’t be “a standard thing you bring out in most arguments”—or if they are, it’s a sign that you need to find better interlocutors. Having to explicitly drag out the ITT is a sign of sickness, not a sign of health. It shouldn’t be normal to have to resort to roleplaying exercises to achieve the benefits that could as well be had from basic reading comprehension and a selfish interest in accurate shared maps.
Steven Kaas wrote in 2008:
If you’re interested in being on the right side of disputes, you will refute your opponents’ arguments. But if you’re interested in producing truth, you will fix your opponents’ arguments for them.
To win, you must fight not only the creature you encounter; you must fight the most horrible thing that can be constructed from its corpse.
The ITT is a useful tool for being on the right side of disputes: in order to knowably refute your opponents’ arguments, you should be able to demonstrate that you know what those arguments are. I am nevertheless left with a sense that more is possible.
- 's comment on On how various plans miss the hard bits of the alignment challenge by (14 Jan 2024 23:46 UTC; 24 points)
- 's comment on Mottes and Baileys in AI discourse by (29 Oct 2025 15:09 UTC; 9 points)
- 's comment on ITT-passing and civility are good; “charity” is bad; steelmanning is niche by (9 Jan 2024 23:13 UTC; 6 points)
I’ve come to increasingly think that being able to steelman positions, especially positions you don’t hold is an extremely important skill to be effective at truth-finding, especially in the modern era, and that steelmanning is mostly normal for effectively finding the truth, rather than being an exceptional trait.
Not doing this is a lot of the reason why political discussions tend to end up so badly.
This is why I give this post a +4.
That said, there are 2 important caveats that limit the applicability of this principle.
The first caveat is that in a lot of discussions, ITT is better when you need to transform a debate between positions into a student-teacher lesson.
The second caveat is that emotions are a huge-rate limiter of rationality, meaning that again the ITT matters more than steelmanning in this use-case.
My prediction for why LW has been less focused on core rationality content is in broad strokes because of the fact that AI has grown more in importance, and more generally one of the lessons rationalists have learned is that object-level practice in a skill (usually) has much less diminishing returns than meta-level thinking (which is yet another example of continual learning mattering a lot for human success).
(Self-review.) I think this post was underappreciated. At the time, I didn’t want to emphasize the social–historical angle because it seemed like too much of a distraction from the substantive object-level point, but I think this post is pointing at a critical failure in how the so-called “rationalist” movement has developed over time.
At the end of the post, I quote Steven Kaas writing in 2008: “if you’re interested in producing truth, you will fix your opponents’ arguments for them.” I see this kind of insight as at the core of what made the Sequences so valuable: a clear articulation of how a monomaniacal focus on the truth implies counterintuitive social behavior. Normatively, it shouldn’t be unusual for people to volunteer novel arguments that support their interlocutor’s belief—that’s just something you’d do naturally in the course of trying to figure out the right answer—but it is unusual, because most disagreements are actually disguised conflicts.
And yet less than a decade later (as documented by Rob Bensinger in the post that this post responds to), we see Eliezer Yudkowsky proclaiming that “Eliezer and Holden are both on record as saying that ‘steelmanning’ people is bad and you should stop doing it”—a complete inversion of Kaas’s advice! (Kaas didn’t use the specific jargon term “steelmanning”, but that’s obviously inessential.)
For clarity, I want to recap that one more time in fewer words, to distill the essence of the inversion—
In 2008, the community wisdom was that fixing your interlocutor’s arguments for them (what was not yet called “steelmanning”) was a good thing. The warrant cited for this advice was that it’s something you do “if you’re interested in producing truth”.
In 2017, the community wisdom was that fixing your interlocutor’s arguments for them (by then known as “steelmanning”) was “bad and you should stop doing it” (!!). The warrant cited for this advice was that “Eliezer and Holden” (who?) “are both on record as saying” it.
Why? What changed? How could something that was considered obviously good in 2008, be considered bad in 2017? Did no one else notice? Are we not supposed to notice? I have my own tentative theories, but I’m interested in what Raymond Arnold and Ruby Bloom think (relavant to the topic of “[keeping] alive the OG vision of improving human rationality”).
fwiw I think there is a good thing about steelmanning and a different good thing about ITT passing. (Which seems plausibly consistent with Rob’s title ITT-passing and civility are good; “charity” is bad; steelmanning is niche, and also your post title here. I haven’t reread either yet but am responding since I was tagged)
ITT passing is good for making sure you are having a conversation that changes people’s minds, and not getting confused/mislead about what other people believe.
Steelmanning is good for identifying the strongest forms of arguments in a vacuum, which is useful for exploring the argument space but also prone to spending time on something that nobody believes or cares about, which is sometimes worth it and sometimes not. (it also often is part of a process that misleads people about what a person or group believes)
Which of those is more important most of the time? I dunno, the answer is AFAICT “each consideration is important enough to track that you should pay attention to them periodically.” And it feels like attempts to pin this down further feel more like some kind of culture war that isn’t primarily about the object-level fact of how often they are useful.
(apologies if I have missed a major point here, replying quickly at a busy time)
+1. I find both this and the post it is responding to somewhat confusing. I’ll jot down my perspective and what’s confusing.
My current take is that ITT-passing is most natural when you are trying to coordinate with someone or persuade someone in-particular. When negotiating with political blocs, it is helpful to know what they want and how they are thinking about a problem in order to convince them of a particular outcome you care about; and when you wish to persuade a particular person, it helps to understand their perspective, so that you can walk from that perspective to yours.
I think that steelmanning is best for positions rather than for people. Finding the strongest argument for X and for not-X is a pretty fundamental part of figuring out whether X is true.
It seems to me that Bensinger et al (i.e. along with Yudkowsky/Karnofsky) are pushing back against forces where people pretend to be having dialogue with them, but keep talking past them for whatever reason; they are performing dialogue and not actually doing it. They claim that one of the common things people do instead is describe the version of their position that seems strongest to them instead of engaging with their actual position.
I am willing to believe them that this is a common negative experience of theirs, but it is not great to then say that “finding the strongest argument for a position I don’t hold” is “bad”.
Anyway, the reason I’m confused is that I don’t know which of the two is more ‘normal’ or ‘niche’. They both seem entirely natural in different contexts. Sometimes it’s worth trying to understand people’s perspectives more, sometimes it’s worth just focusing on intellectual reasoning wherever it takes you, and what other people think is simply not relevant.[1] Which of these situations is more common? That’s not currently clear to me.
Probably Zack makes a stronger case than I am for steelmaning being more ‘normal’, but after having just now quickly re-read the post I could not pass his ITT well enough to state it; I thought it probably worth jotting down my perspective anyway.
Aside: I have a hazy sense that Zack (in this post) seems to undervalue learning to pass the ITT of minds very different from one’s own, of finding very different perspectives on the world and coming to understand them. I have found this a fruitful way to see parts of the world I have not seen before (as a pointer, I think it valuable to develop many different shoulder advisors), even while it is the case that most of the time it is best to just talk about how the world works rather than people’s perspectives on it.