Screwtape comments on The Dark Arts As A Scaffolding Skill For Rationality

Screwtape 5 Aug 2025 15:57 UTC
2 points
0
- Selection vs Learning
If we select for people who’ve independently figured these things out, then we should do our best to learn from them. We should try to keep them around if at all possible; we should pay attention to their commentary on the matter as well as to their behavior; and we should preserve and promulgate their insights.
Somewhat agreed. Sometimes people who are good at the thing aren’t good at teaching it, or even at being in the environment in which it’s taught.
To use physical combat as an example, maybe the world’s best fighters are hardened criminals and hyper-vigilant soldiers. If that’s the case I’d guess the move is to have instructors to study them, talk with them, and then I want the instructors to make the grayspace safer for students than criminals would allow and kinder to the teachers than hyper-vigilance and flailing limbs would combine to be.
(I’m two steps down hypotheticals here- one being that the people good at the thing aren’t necessarily the best at teaching it, and second that there’s a better way to teach than just throwing students into the deep end. I don’t presently have citable evidence for either of those, just intuition and metis.)
If I may use fictional evidence as an example for a moment, Dr. House would be a terrible biology 101 lecturer. I’m a fan of figuring out how to teach the things and then trying to make sure they get taught.
There’s a pretty important ~crux of “okay, what are you trying to do right now though?” for me. I spend a lot of my time trying to teach what I view as basic and early-intermediate skills, or trying to be a force multiplier for other people to teach those. To do that well, it helps me to have a sense of what the advanced skills are like, and for what people do with these skills out the in the world, and it costs a decent amount if someone is bad at explaining themselves or drives people to drop the class. If someone views themselves as managing an intensive care unit instead of managing a (deeply weird and distributed and non-standard) education system, they might correctly be a lot more willing to put up with various drawbacks in exchange for brilliant diagnostic insights.
(I also spend time doing operations work that has only downstream relationship to rationality skills, and time handling conflict management in a way where spotting lies is directly useful to me.)
- Swords
Okay. I agree that the text of the essay is talking about putting away swords under some circumstances. If you are interpreting it as “put away swords at all, ever” then yes. Something about the connection with repairing and maintaining the way lead me to internally complete it as “put away swords all time time, such that you never pick one up” and that’s different. To the extent where that’s my misunderstanding, oops, I apologize.
So far I’m in agreement with Duncan here.
Of course, CONSTANT VIGILANCE needn’t be all that great a burden.
I don’t know what your internal experience of CONSTANT VIGILANCE is like. For me, in interpersonal social situations, it sucks. I dislike the feeling. I have a lot of trouble setting my mental dial high enough to reliably catch manipulative shenanigans and low enough that I don’t waste hours cross checking my notes on what people say against the physical evidence that exists, letting pass the normal mistakes of normal human minds and picking up on the patterns of malefactors.
“Likewise, adopting practices that preclude Dark Arts is just good sense.” If we as a collective rationalist community (or better yet a general civilization) had an agreed upon canon of techniques or language use which precluded particular Dark Arts, that’d be great. We don’t. Again, I spend a lot of my time in the top of the funnel with first-timers and old-timers-who-nobody-taught. So when I spot someone doing something that seems Dark Artsy- not just at the “metaphorically carrying a sword” level but the “metaphorically slashing” level- I have a bunch of extra work to figure out whether nobody told them they shouldn’t, whether they’ll stop if I explain the better way to do it, whether stopping even makes sense or if we’re just actually in an environment where unilaterally disarming would be a bad idea.
But if we rule it a transgression, to be vigilant, to be on the lookout for treachery—well, now we’re simply sabotaging ourselves.
I think turning the sensitivity dial all the way down, such that you wouldn’t notice any amount of sword usage, is a mistake.
I think different responses to it based on context makes a lot of sense. If someone throws a punch at me on the street, I’m in fight mode. If someone who isn’t the teacher throws a punch at me seemingly randomly in the dojo, I block and calmly explain that we don’t do that, we throw punches at people when in the ring sparring or when told to drill. (And if the teacher does it, it kinda depends on the teacher’s style and style of the dojo.) If someone steps into the ring opposite me, we bow to each other, the ref says go, and they throw a punch at me, this is fine.
You used the phrase “lookout for treachery.” Yeah, throwing a punch during an agreed on spar isn’t treachery, sucker punching someone when they’re tying their shoelaces is. We’re using the sword metaphor a lot. With rationality, I think there’s moves that are actually just normal and expected in much of the world and would be disallowed in the inner space of Duncan’s dojo. Like, when I buy a used car in a strange town, I basically expect the dealer to try and screw me over and not tell me about a history of engine trouble. I’m not mad at the dealer for trying it. If I was buying a used car from an old friend, I would expect them to tell me if there was engine trouble, and I’d be mad if they conveniently forgot to mention it. It’s not even treachery when the dealer does it, that’s basically how everybody knows buying used cars works.
To put it another way, what is foolish is the idea that we can defend against Dark Arts by some sort of person-level inclusion test: someone passes the test? Great, now we never again have to think about whether anything they say or do is suspect.
That seems to be an exaggeration. I agree that a person-level, one-time test after which you trust someone entirely would be a mistake. I would be surprised if Duncan said that you couldn’t get bounced out of Master Lee’s advanced classes once you were in. Are you suggesting that’s actually what he’s saying?
But trusting someone more because of what they’ve done, with less regular checks, seems fine. One doesn’t give every single user full admin powers, but the I.T. staff has more power, and the Chief Technology Officer might have essentially full power in the system. And yet, if someone notices the logs are weird and it looks like the CTO is up to something, that still gets flagged.
So how can this be any less true for a group, made up of God only knows how many people, who know each other not nearly so well as you know yourself…?
I do not believe it will ever become entirely, 100% true. I do believe it can become true more often, of more people, if slow and steady work is put into the project. I think that’s worth trying.
When I walk down the street, full of strangers I have never met before and will never meet again, I don’t particularly worry someone is going to punch me in the face. If someone steps into my path and starts yelling at me, my odds go up a little and maybe I make sure my hands are free- but my odds aren’t actually very high yet, because more people just yell about conspiracy theories or the unfairness of the world than commit assault. And then if I see them start to draw back and swing, now my odds are actually high.
We have, as a civilization, achieved a world where adults in our cities don’t need to carry swords.
Thus so with manipulation and lies. I kinda trust random people on the street, though I wouldn’t put too much weight on it. I basically did trust the people I worked with at my old software job- sometimes I double checked to cover their mistakes, I was aware enough that office politics existed that I think I would have noticed a string of problems, but we also had all been through the educational and acculturation processes that meant we broadly agreed on what good software was like and what behavior towards each other was appropriate. But that education and acculturation wasn’t just pointless busywork! (Some of it was, but some of it wasn’t!) University was a gray space where the really dumb and the really rude got corrected or weeded out- not perfectly, but usefully.
Yeah, I try and check my own code for problems. Yeah, I did code reviews and checked my coworker’s code for problems. But there’s finite amount of time and energy and I scrutinized incoming pull requests from the public way closer than I scrutinized my coworker’s changes.
And circling back to rationality and what I said about context above: one can take in some portion of the masses that haven’t learned how not to ad hominem or how to taboo their words so it isn’t surprising if they try, teach them the thing and that it’s expected of them in this space, if not out in the world, and then pass them on to a zone where one can just expect that everyone in here knows they aren’t supposed to do that and has demonstrated that they’re capable of it. That doesn’t mean one can 100% know nobody will do it, but it changes the rate significantly and it would change my reaction from “ah, you have made a common error, let me show you the correct move” to “you know better than that. What’s the deal?”
I would love more gray spaces.
- Rationalist Skill
Mm… the principle of “just because something is a good thing does not mean that it is a ‘rationalist’ thing” is of course entirely correct. The question is just what qualifies some skill as a “rationalist” skill.
I think this is a good question, and also it verges on “what is the full list of rationalist skills or at least a ” which seems a bit big for this particular comment section. I think spotting lies and more general manipulation and deceit is pretty centrally a rationalist skill.
If we take “rationalist skill” to mean only such skills as are somehow “meta” or apply to the development and practice of techniques of rationality, then my reason #2 should not convince us that lying is a rationalist skill.
And that seems self-referential. Hockey skills are the ones that apply to the practice of techniques of hockey, chess skills are the ones that apply to the practice of techniques of chess, florgle skills are the ones that apply to the practice of techniques of florgle. If we have a solid definition of rationality (ideally one concise and generally agreed upon, but any working definition is fine for the purposes of conversation between a few people) then rationality skills are the skills which apply to rationality.
From LessWrong’s About page:
Rationality has a number of definitions^[1] on LessWrong, but perhaps the most canonical is that the more rational you are, the more likely your reasoning leads you to have accurate beliefs, and by extension, allows you to make decisions that most effectively advance your goals.
(If you think that’s a poor definition of rationality, that’s fine, but I think that’s a close-enough definition that I’m fine using it and the onus is on whoever disagrees to state what they think rationality is.)
It’s again plausible to me that getting better at lying makes someone better at detecting lies (thus having more accurate beliefs.) And of course, in the right circumstance being able to lie well certainly can advance your goals. I’m not against learning to lie in the pursuit of learning to detect lies- that’s practically the whole point of this essay. But I do think it’s worth noting the distinction, and that the goal is learning to detect lies and other manipulations.
If we take “rationalist skill” to mean also such skills as are instrumentally-convergently useful in a near-universal set of classes of circumstances, regardless of details, with the only requirement being that interaction between non-perfectly-aligned agents is a significant aspect of the scenario, then my reason #2 should convince us that lying is a rationalist skill.
That argument proves too much I think. The ability to inflict pain and physical damage on other people is convergently useful in interactions between non-perfectly-aligned agents (and I’m Hobbesian enough to think it’s near-universally important at least in potential) but that doesn’t mean I think marksmanship with guns is a rationalist skill. Same with the ability to make friends, or communicate clearly and understand what other people are trying to communicate; it’s useful (probably in even more circumstances than lying is) but I don’t think of clear communication or making friends as a specifically rationalist skill. It’s just a useful thing to be able to do.
- Said Achmiz 5 Aug 2025 17:48 UTC
  11 points
  0
  Parent
  Re: learning from the pros:
  
  Indeed the best pros might not be the best teachers, but this is exactly why I didn’t say “we should install them as teachers”; I said “we should do our best to learn from them”. That need not require that they teach. It can indeed look like what you describe—having instructors mediate the process, etc.
  
  On the other hand, it’s worth remembering that the actual domain that we’re talking here mostly involves words, not guns or punches or medical procedures. The way that skill manifests is mostly in words. The means by which we encounter the pros (or anyone else) mostly involves words. For my example, I linked to an old Less Wrong comment, which is made of words. This inherently biases the “best way to learn from the pros” distribution toward learning directly from the pros, without some sort of meditation.
  
  And there’s a serious danger in taking the sort of “at some remove” approach you describe, which is that there’s really no guarantee that the instructors are… well, any good. In many ways. Maybe they don’t really understand what they themselves learn from the pros, maybe they aren’t any good at teaching, maybe they introduce distortions (either purely accidentally, or through a sort of teacher’s déformation professionnelle, or deliberately in the name of effectively teaching a wider range of student, etc.), maybe who knows what. And if the students don’t interact with the pros directly, how do they get feedback on what they’ve learned? From the same instructors? That is a very different thing from having on hand the one who has mastered the skill you are trying to learn!
  
  What’s more, there is an element of absurdity to all of this, which is that while the model of pros, instructors, and students might make sense for many sorts of practical skills, in the domain that we’re actually discussing, the far more likely scenario is that, within a community, there are some who are pros at one thing and some who are pros at another thing, and the ideal is that we all learn from each other (with an unequal distribution of “sources of mastery”, to be sure, but nevertheless no sharp divides!). If I am a master at one thing, and you at another, then if we are equals within a community, we can both learn from each other; and others can learn from us both, at once; and we can improve together. But separate us, place a layer of instructor-interpreters between each of us and the rest of the community (including the other), and you replace an extremely high-bandwidth, bi-directional channel with a unidirectional and very narrow one. It’s hard to see how such a setup could be worth the effort.
  
  Okay. I agree that the text of the essay is talking about putting away swords under some circumstances. If you are interpreting it as “put away swords at all, ever” then yes. Something about the connection with repairing and maintaining the way lead me to internally complete it as “put away swords all time time, such that you never pick one up” and that’s different. To the extent where that’s my misunderstanding, oops, I apologize.
  
  The key question is whether we should put our swords away while we’re within the community. Duncan’s essay says yes. I say no, for the reasons I give.
  
  Of course, CONSTANT VIGILANCE needn’t be all that great a burden.
  
  I don’t know what your internal experience of CONSTANT VIGILANCE is like. For me, in interpersonal social situations, it sucks. I dislike the feeling. I have a lot of trouble setting my mental dial high enough to reliably catch manipulative shenanigans
  
  For me it’s just the normal and completely unproblematic state of affairs. It requires no effort. If I’m not sleep-deprived or otherwise impaired, it’s passive and comes naturally.
  
  And this is the point of having a “community” (broadly understood), right? I mean, this is why we have a comments section under every post, instead of just people shouting into the void. Different people have different skills. One of those skills is the skill of “catching manipulative shenanigans” (although I wouldn’t quite put it that way myself). So if you read a post, you don’t need to have your CONSTANT VIGILANCE dial set real high. You can scroll down to the comments and see what other people, to whom it comes more naturally, have to say. (And what about those people’s comments? Well, if they do shenanigans, then other such people can catch them.)
  
  Division of labor, you see. You don’t need to be good at everything. From each according to his ability…
  
  Unless, of course, you forbid those people from putting their skills to work. Then you’re back to having to do it yourself. Or just trusting that you won’t need to. Seems bad.
  
  “Likewise, adopting practices that preclude Dark Arts is just good sense.” If we as a collective rationalist community (or better yet a general civilization) had an agreed upon canon of techniques or language use which precluded particular Dark Arts, that’d be great. We don’t.
  
  Come now, this is a silly objection. Of course we have some disagreements about this, but it’s not like we don’t know anything about what practices guard against such threats. Of course we do! As a “general civilization” and as a “collective rationalist community”! We have the scientific method, we have the Sequences, etc. We know perfectly well that principles like “you need to provide evidence for your claims” and “you need to not write your bottom line first” and “you need to apply your abstract principles to concrete cases” and “you can’t just conclude that something is true because one guy said so” guard against a very broad class of manipulations. It is just completely untrue to say that we don’t agree on any practices that preclude Dark Arts.
  
  With rationality, I think there’s moves that are actually just normal and expected in much of the world and would be disallowed in the inner space of Duncan’s dojo. Like, when I buy a used car in a strange town, I basically expect the dealer to try and screw me over and not tell me about a history of engine trouble. I’m not mad at the dealer for trying it. If I was buying a used car from an old friend, I would expect them to tell me if there was engine trouble, and I’d be mad if they conveniently forgot to mention it. It’s not even treachery when the dealer does it, that’s basically how everybody knows buying used cars works.
  
  Yes, of course. And my point is, how do you keep these things from happening in the dojo^[1]? You need to remain on the lookout for it happening. You can’t just say “don’t do this” at the door and then assume that nobody’s doing it and punish anyone who makes any effort to figure out whether someone’s doing it. That’s crazy.
  
  To put it another way, what is foolish is the idea that we can defend against Dark Arts by some sort of person-level inclusion test: someone passes the test? Great, now we never again have to think about whether anything they say or do is suspect.
  
  That seems to be an exaggeration. I agree that a person-level, one-time test after which you trust someone entirely would be a mistake. I would be surprised if Duncan said that you couldn’t get bounced out of Master Lee’s advanced classes once you were in. Are you suggesting that’s actually what he’s saying?
  
  Is he saying it? Of course not. Does his proposed structure directly entail it? Yep.
  
  But trusting someone more because of what they’ve done, with less regular checks, seems fine. One doesn’t give every single user full admin powers, but the I.T. staff has more power, and the Chief Technology Officer might have essentially full power in the system. And yet, if someone notices the logs are weird and it looks like the CTO is up to something, that still gets flagged.
  
  Nah, bad analogy.
  
  The correct analogy would be: you’re not supposed to send proprietary company info through unencrypted email. But you trust the CTO more, so if he sends proprietary company info through unencrypted email, it’s fine.
  
  That would be totally crazy, right? Of course the CTO shouldn’t be doing that! Indeed it’s much more important that the CTO doesn’t do that (because his emails are more likely to contain stuff that you really don’t want leaking to your competitors, because he’s more likely to be the target of attackers, etc.).
  
  If you’re the CEO, and you call in your CTO and tell him to immediately stop sending proprietary company info through unencrypted email and what the hell was he even thinking, seriously, and he protests that it’s not like you have any particularly reason to believe that your company secrets have actually been stolen, so what’s the problem…
  
  … then you fire the guy. Right? Or you should, anyway. His defense is not only not worth anything, it betrays a fundamental failure to understand security or… just common sense.
  
  I’ve made this point many times before, but: rationality isn’t about being so rational that we can do dumb things and still be ok, it’s about not doing the dumb things.
  
  We have, as a civilization, achieved a world where adults in our cities don’t need to carry swords.
  
  Replace the word “swords” with “guns” and it should instantly become obvious that this claim is a highly controversial one, about which many political arguments might be had.
  
  Yeah, I try and check my own code for problems. Yeah, I did code reviews and checked my coworker’s code for problems. But there’s finite amount of time and energy and I scrutinized incoming pull requests from the public way closer than I scrutinized my coworker’s changes.
  
  Of course, but presumably this is because you write your own code with proper techniques and approaches, you reflexively avoid programming styles or design patterns (or anti-patterns) that are bad ideas, etc. In other words, your checks are built in to the process which generates the code.
  
  Now suppose that you have a very trusted coworker of whom you know that he is as good a coder as you are. One day he submits a PR, you skim it casually, expecting to find nothing but the usual good code, but notice that it’s full of nonsense and absurdity. Well, now you have to read it more carefully, and ask your coworker what the heck is up, etc.
  
  That doesn’t mean one can 100% know nobody will do it, but it changes the rate significantly and it would change my reaction from “ah, you have made a common error, let me show you the correct move” to “you know better than that. What’s the deal?”
  
  Sure, sure. But then, I tend to think that both of those formulations presume rather more than is necessary. Sidestep both of them by saying “that is such-and-such error”, the end. If the one says “huh? what are you talking about?”, then you follow up by saying “here is why it’s an error, and here is how to avoid it”. If the one says “normally it would be, but not in this case, for these-and-such reasons”, great. The most important thing is that you notice the error and point it out. The details of wording are inconsequential.
  
  So how can this be any less true for a group, made up of God only knows how many people, who know each other not nearly so well as you know yourself…?
  
  I do not believe it will ever become entirely, 100% true. I do believe it can become true more often, of more people, if slow and steady work is put into the project. I think that’s worth trying.
  
  Totally. And the only way it will become more true is if we continually work to ensure that it remains true, by verifying, no matter how much we trust. There can be no other way.
  ↩︎
  I hate the “dojo” metaphor, by the way. I really wish we’d do away with it.
- Said Achmiz 5 Aug 2025 18:06 UTC
  2 points
  0
  Parent
  
  Mm… the principle of “just because something is a good thing does not mean that it is a ‘rationalist’ thing” is of course entirely correct. The question is just what qualifies some skill as a “rationalist” skill.
  
  I think this is a good question, and also it verges on “what is the full list of rationalist skills or at least a ” which seems a bit big for this particular comment section. I think spotting lies and more general manipulation and deceit is pretty centrally a rationalist skill.
  
  No argument there.
  
  If we take “rationalist skill” to mean only such skills as are somehow “meta” or apply to the development and practice of techniques of rationality, then my reason #2 should not convince us that lying is a rationalist skill.
  
  And that seems self-referential. Hockey skills are the ones that apply to the practice of techniques of hockey, chess skills are the ones that apply to the practice of techniques of chess, florgle skills are the ones that apply to the practice of techniques of florgle. If we have a solid definition of rationality (ideally one concise and generally agreed upon, but any working definition is fine for the purposes of conversation between a few people) then rationality skills are the skills which apply to rationality.
  
  Ehh… I don’t think that this is quite right.
  
  For one thing, the term in question is “rationalist skill” and not “rationality skill”. There’s a difference in connotation… but let’s let this pass for now.
  
  More importantly, I think that your examples are actually all non-circular. Yes, chess skills are the ones that apply to the development (this part can’t be left out!) and practice of techniques of chess. This seems like a perfectly reasonable statement to me. You have the game of chess; you have techniques of playing the game of chess; you have skills which apply to the development and practice of those techniques. Where’s the circularity?
  
  Take baking. You’ve got the act of baking a pie. You’ve got techniques of baking a pie, like “bake covered at a certain temperature (e.g. 425 °F) for a certain time (e.g. 25 minutes), then uncover and continue to bake at some potentially different temperature (e.g. 350 °F) for some more time (e.g. 35 more minutes)”. And you’ve got skills that apply to those techniques, like “carefully removing a sheet of aluminum foil from a hot pie dish in the oven without disturbing the pie or burning yourself”. Seems straightforward to me.
  
  It’s again *plausible *to me that getting better at lying makes someone better at detecting lies (thus having more accurate beliefs.) And of course, in the right circumstance being able to lie well certainly can advance your goals. I’m not against learning to lie in the pursuit of learning to detect lies- that’s practically the whole point of this essay. But I do think it’s worth noting the distinction, and that the goal is learning to detect lies and other manipulations.
  
  Well, again, these are two separate points that I’m making here. One is that getting better at lying makes you better at detecting lies. Another is that being good at lying is useful on its own. They are not really related to each other. You can accept one and not the other.
  
  If we take “rationalist skill” to mean also such skills as are instrumentally-convergently useful in a near-universal set of classes of circumstances, regardless of details, with the only requirement being that interaction between non-perfectly-aligned agents is a significant aspect of the scenario, then my reason #2 should convince us that lying is a rationalist skill.
  
  That argument proves too much I think. The ability to inflict pain and physical damage on other people is convergently useful in interactions between non-perfectly-aligned agents (and I’m Hobbesian enough to think it’s near-universally important at least in potential) but that doesn’t mean I think marksmanship with guns is a rationalist skill. Same with the ability to make friends, or communicate clearly and understand what other people are trying to communicate; it’s useful (probably in even more circumstances than lying is) but I don’t think of clear communication or making friends as a specifically rationalist skill. It’s just a useful thing to be able to do.
  
  Hm… I think it proves exactly the right amount, actually?
  
  Like, “marksmanship with guns” is an unnecessary increase in specificity; if you instead say “the ability to inflict pain and physical damage on other people is convergently useful in interactions between non-perfectly-aligned agents, and is therefore a rationalist skill” then… I think that this is just true (or rather, it fails to be a reductio—the example doesn’t undermine the appeal of the second proposed definition of “rationalist skill”).
  
  (The skill of selecting the optimal method of inflicting pain and physical damage, given your abilities and resources, the skill of deciding what other skills to develop given some goal, etc.—these are also “rationalist skills” in the same sense! So given that you want to be able to inflict pain and physical damage on other people, the question then is how best to do so; “develop marksmanship skill with guns” is one answer, but not the only one.)
  
  Same with the ability to make friends, or communicate clearly and understand what other people are trying to communicate; it’s useful (probably in even more circumstances than lying is) but I don’t think of clear communication or making friends as a specifically rationalist skill. It’s just a useful thing to be able to do.
  
  Same as above.
  
  Note, I am not advocating strongly for the second, more expansive, sort of definition of “rationalist skill”; as I said, I have no strong preference. But I do think that the second definition is basically coherent, and doesn’t lead to absurdity or proving-too-much etc. (There may be other reasons to disprefer it, of course.)