The hostile telepaths problem
Epistemic status: model-building based on observation, with a few successful unusual predictions. Anecdotal evidence has so far been consistent with the model. This puts it at risk of seeming more compelling than the evidence justifies just yet. Caveat emptor.
Imagine you’re a very young child. Around, say, three years old.
You’ve just done something that really upsets your mother. Maybe you were playing and knocked her glasses off the table and they broke.
Of course you find her reaction uncomfortable. Maybe scary. You’re too young to have detailed metacognitive thoughts, but if you could reflect on why you’re scared, you wouldn’t be confused: you’re scared of how she’ll react.
She tells you to say you’re sorry.
You utter the magic words, hoping that will placate her.
And she narrows her eyes in suspicion.
“You sure don’t look sorry. Say it and mean it.”
Now you have a serious problem. You don’t have an internal “actually mean it” button. And yet here’s Mom peering into your soul and demanding that you both have that button and press it. Trying to appease her didn’t work. She needs you to be different — and she’s checking.
What can you do now?
This is a template for what I’ve come to call “the hostile telepaths problem”. I think it’s a common feature of social problems. The hostile telepaths problem is when you’re dealing with a being (a) who can kind of read your internal experiences and (b) whom you don’t trust won’t make your situation worse due to what they find in you.
There are lots of solutions to the hostile telepaths problem. I don’t claim to know all of them. But recognizing some common ones has helped clarify a lot of my thinking — particularly around self-deception and akrasia.
And getting very clear on the nature of the problem makes identifying real solutions way easier. This fact produces some previously-surprising-to-me predictions, especially for trauma processing and for making emotionally difficult decisions.
I’ll try to spell out what I mean with some theory and a few examples.
Newcomblike self-deception
There’s one really tricky solution to the hostile telepaths problem. It deserves some special front-loaded attention before I name some other solutions.
Here I’ll try to spell out its logic with a modification of Newcomb’s Problem.
Suppose that instead of Omega, you’re dealing with Omega-V. Omega-V is an extremely good but imperfect predictor of someone’s box choice.
Through some clever sleuthing, you deduce why. Omega-V is scanning only folks’ visual thinking for a little while prior to their box choice. It’s extremely good at tracking the difference between the mental visualizations of one-boxers vs. two-boxers… provided they don’t know Omega-V is scanning specifically their visual thinking.
So now you have an edge. Maybe you can trick Omega-V. Maybe you can visually think like a one-boxer and use other modes of reasoning (like inner talk) to plan on actually two-boxing. You just have to be very careful to have any “leaks” of reasoning into visualization look totally normal for a one-boxer.
If you decide it’s worth the risk to try for this deception, the visual reasoning part of your mind has to be kind of siloed from the rest. It’s like you have a subagent that truly honestly believes you’re a one-boxer. It’s the non-visual reasoning parts of you that are aware of the real plan.
Another way of saying this is: in normal Newcomb’s Problem, you can’t tell whether you’re in the real world being faced with Omega’s offer or if you’re instead Omega’s simulation of real-you and you’re revealing what real-you will do. The situation is similar with Omega-V, but in this case it’s specifically your visual reasoner (viewed as kind of a subagent) that has to be built in such a way that it can’t tell whether it’s a simulation. Many other parts of you can tell perfectly well what’s going on![1] But for the deception to work, the way in which you plan on two-boxing cannot be something your inner visual reasoner can even deduce.
Now, let’s take this one step further.
Let’s suppose we’re instead dealing with Omega-C. Also an imperfect predictor. But this time you learn that Omega-C scans specifically all of — and only — your conscious thinking.
At this point, the right strategy is to consciously decide you’re going to one-box.
But some people still find themselves two-boxing and winning. Even while consciously intending to one-box. Sometimes they get confused and say the wrong thing or push the wrong button. Some of them trip and accidentally grab both boxes. Some laugh and grab both boxes and find them both full.
Assuming that at least some of those outcomes are strategic… how?
Hint: now you (by which I mean conscious parts of you) are maybe the siloed part. Basically an illusion meant to trick Omega-C.
The thing is, you cannot tell. Because if you could, then the deception — if it were there — wouldn’t work.
Which is to say: on the inside, you cannot tell the difference between running this Omega-C deception strategy and honestly truly just being a one-boxer.
And furthermore, it’s actually to your advantage not to check. Even if you were to find some way of being able to tell the difference!
But because of that, if you’re aware of this deception strategy, it might work best if you honestly (from your perspective) look for some way to tell if you’re running it. Actively trying to “come clean”. That’s a pretty powerful signal to Omega-C that you really are a one-boxer.
So if you can make it so that it seems to you like you really are looking, including accounting for awareness of possible non-conscious “shadow” reasoning, and you find none, but you still two-box in the end…
…well, you win!
…while the conscious parts of you are confused and surprised. And possibly quite dismayed.
Sketch of a real-world version
I call this pattern “Newcomblike self-deception” as a nod to Nate Soares’ point that Newcomblike problems are the norm.
The deal is that people are kind of telepathic, in the sense of using things like vocal tone and body language and microexpressions and so on to intuitively deduce what might otherwise be hidden in others. They’re scanning for things like intent and strategy and emotional tone. Often this telepathy is in service of synching up (“Hey, you seem upset, is something wrong?”), but it doubles as threat detection.
This telepathy is imperfect. Which means that sometimes Newcomblike self-deception is in fact a viable strategy.
I’d like to name one way I think this type of self-deception can actually happen in a person. It might be the main way it happens, or it might be rare. I honestly don’t know. But it’s one I’ve in fact seen in myself[2] and I think I’ve observed in some others.
By some mysterious method, it’s possible to contract your awareness — by which I mean, the space of things you’re actually aware of can be smaller than the space of sensory inputs (including mental experiences like thoughts and memories). Lots of people experience this when watching TV (losing awareness of the room), or when deep in flow work (not noticing hunger for hours while programming).
If you construct a sort of fake self in your mind, and then contract your awareness around that fake self, it can seem to you on the inside like you really are the way depicted in the fake self. Like it’s not fake, it really is who you are.
If you also build up explanations to your fictitious self about why things outside that fiction either are consistent with it or don’t matter, then you both (a) can honestly display to hostile telepaths that you (here meaning fake you) are being fully sincere in not hiding anything and (b) possibly give the telepaths ways of discounting the unavoidable signals that you (here meaning you holistically) are hiding something.
For instance, as a child whose mother says to you “Say you’re sorry and mean it”, you might be able to strategically misinterpret your fear of Mom’s Wrath as “being really sorry”. As long as you’re not aware that that’s what you’re doing, it might work very well! She might read your distress as you really meaning it. (“I’m sorry I’m sorry I won’t do it again please Mom I’m sorry…!”) And you can keep yourself from being aware of this whole strategy by keeping your awareness contracted on the fictitious version of yourself that’s “bad” and “very sorry”, and keeping your understanding of the real problem outside of your awareness.
Possible examples in real life
Here are some examples I think I’ve actually seen — in culture, in others, and in myself:
I think the thing with kids that I sketched above really does happen. More generally, I think similar applications of Newcomblike self-deception are the root cause of (a certain very common kind of) shame: it’s a strategic mislabeling of one’s pain as being about one’s “flaws”.
Relatedly, lots of folk mislabel their experience as “I hate math.” Most people I’ve talked to who say this actually hate the coercion and gaslighting used almost universally in math classes. The real problems most folk are focused on in math class are social, like “Appease the teacher” and “Get Mom & Dad off my back.” But teachers and parents might insist to a student that “you need to try harder” with the math itself while seeming to sort of telepathically scan them for whether they are in fact trying. I think this can sometimes lead students to strategically mislabel their distress about the situation to themselves.
Gurus getting involved in sex scandals. I’m sure that at least some of them have been very sincere about what amounts to real Jungian shadow work. But somehow all that sincerity mysteriously ends up hiding and serving (instead of revealing and dealing with) an underlying drive to just get laid.
Likewise people “accidentally” cheating. Sometimes folk really are just surprising-to-them vulnerable in some situation and don’t have the right kind of discipline when they turn out to need it. But the fact that that ever happens can act as a cover. It’s especially obvious in cases of repeated “accidental” cheating.
I’ve seen four friends, as mothers, stay with and defend abusive partners (boyfriends or husbands) for years. She’d often insist that he’s just stressed, or it’s a frequent misunderstanding but they love each other, etc. In three cases it became possible for her to consider that he might be abusive after a change in her work gave her enough money to support herself and her child without him if need be. In the fourth case, the mother got a lot of social support such as a place to live and people she trusted to take care of her and her child, and then she had room to consider her partner’s actions as abusive.
If I’m upset with a friend and I’m worried that they can’t handle what I’m upset with them about, sometimes I can’t think straight about what my problem with them is while I’m talking to them. My mind gets foggy, my concepts seem mushy even to me, the words I remember from journaling about it before now form what feels like a gibberish argument, etc. Often this fog suddenly clears up if I get a vivid sense from my friend that our friendship will be fine after we talk. It also gets clearer if the issue is so big that I realize I’m fine with them not being in my life after we talk.
Badly wanting someone to like you can make them like you less. So how do you get them to like you? Not by being aware that you’re asking that question! But maybe if you do things for them without knowing that’s why you’re doing them…? (“Oh, I forgot Bob likes sushi! I just got some because I felt like it, honest!”) And maybe if you add an extra dose of self-loathing (“God, I’m being a creep, aren’t I? I always do that!”) you can pass
Omega-C’sothers’ scrutiny here by eliciting care & concern when you might otherwise get caught.
I’m not trying to be exhaustive here. There are tons more examples.
Other solutions to the problem
We can’t actually penetrate our own Newcomblike self-deception without having another viable (to us) strategy for dealing with hostile telepath problems.
However, if we do have another strategy in a given instance, then in that instance it can be safe to look. The self-deception can lift.
Gaining independence
One alternative strategy type is, coming to trust that you’re able to handle the consequences of being accurately seen.
Such as the moms in the abusive partners example above: each one could acknowledge her self-deception once it was safe for her abusive partner to know too. She got enough independence (financial or social) to protect herself and her child, making the telepathic scan no longer a dire threat.
I think a lot of “trauma processing” amounts to this “gaining independence” strategy. But it’s more like, noticing you already have independence. I bet a lot of foundational self-deception habits come from being a child faced with telepaths (adults) who have a lot of power over them. A kid who deals with Mother’s “Say sorry and mean it” demand with self-deception might then grow up to become really apologetic and “have low self-esteem”. But it’s just an old strategy for dealing with Mother that hasn’t made contact with the fact that Mother isn’t that powerful over them anymore. It’s now actually just fine for her to know they’re not “really sorry”. If this raw physical truth comes into contact with the impulse to “be sorry”, the mental firewall might simply collapse, and the mislabeling will stop.
So in many cases, “trauma processing” can basically mean noticing you’re not a child anymore. You have independence. So you don’t have to appease the hostile telepaths just because they’re adults. They can just know your internal state, and you (trust that you) can handle the consequences of them knowing.
Building emotional resilience is like this, I think. If you (trust that you) can handle the emotional and somatic sensations of others being upset with you, then you don’t have to hide the parts of you that might make them upset. They can just be upset. While you might not like it, you know you’ll be fine.
(Not to say anything about what’s ultimately good to do here. Caring about others’ reactions totally makes sense for other reasons, like the health of the community we’re in. Here I’m focusing specifically on what can solve the hostile telepaths problem without self-deception.)
Occlumency
Another solution type is occlumency. Which is to say, if you trust you can keep your real goals and/or strategies hidden from a hostile telepath even if you consciously know what your goals/strategies are, then it’s safe to consciously know them.
(This is something like switching from Omega-C to Omega-V.)
A classic example is in WWII when Nazis come knocking and ask if you’re harboring any Jews. The analog of one-boxing here is just not harboring Jews. Newcomblike self-deception doesn’t seem plausible to me here. You very much don’t have the independence needed to handle the consequences of being caught “two-boxing”. So if you’re helping refugees, you probably have to lie convincingly. And if self-deception were a plausible strategy here, you wouldn’t need it to the extent that you trust your ability to hide the truth from the Nazis even if you know the truth.
I think many psychopaths[3] use occlumency quite a lot. I’ve met some who know full well that they’re trying to manipulate others and are presenting a façade to do so. It works for them in part because they don’t send implicit distress signals around thinking they’re bad for being manipulative: they’re not nervous, so they don’t need to explain their nervousness away.
There’s a moral tangle here. Honesty is important for connection, integrity, and communal health. But you might not trust that it’s safe to reveal the truth to a hostile telepath.[4] In this case, the moral injunction not to lie makes occlumency harder (because of fear of being caught, plus doubt about whether you should be using occlumency at all). This situation can leave self-deception as your only viable solution — which, incidentally, means you’re still not being honest!
I think this means that if you care both about (a) wholesomeness and (b) ending self-deception, it’s helpful to give yourself full permission to lie as a temporary measure as needed. Creating space for yourself so you can (say) coherently build independence such that it’s safe for you to eventually be fully honest.
Solution space is maybe vast
I’ve named three solutions to the hostile telepaths problem:
Newcomblike self-deception
Gaining independence
Occlumency
These aren’t the only ones. A pretty simple one is simply running away and avoiding them. Another is investigating whether the telepaths are in fact hostile and discovering they’re not (if that’s true). Yet another is to jam telepathic scans with emotional charge that backs privacy norms. (“It’s none of your business whether I ‘really am’ sorry!”)
The important part isn’t that we have a full taxonomy. That might be helpful, I don’t know. The important part, as far as I’m concerned, is that by being very clear about what problem we’re solving, we can tell when something is — and is not — a solution.
Ending the need for self-deception
By this model, to end (Newcomblike) self-deception, we have to remove the need for it. This means solving each instance of the hostile telepath problem some other way.
This is kind of tricky in practice. When you use self-deception to deal with a hostile telepath, you can’t know that that’s what you’re doing. You[5] can’t even know which hostile telepath problem you’re solving! So how do you come up with another solution?
I don’t have a provably general answer, but I have a pretty general approach that makes sense to me and has clearly worked several times. I’ll share that approach here.
Welcome self-deception
First is welcoming that I’ll self-deceive.
But this isn’t “Well, I’m going to do it anyway, so I might as well be okay with it.” That’s nonsense: you probably can’t just “be okay” with it. And trying probably makes the problem worse![6]
I mean something more wholehearted. If I self-deceive, it’s because it’s the best solution I have to some hostile telepath problem. If I don’t have a better solution, then I want to keep deceiving myself. I don’t just tolerate it. I actively want it there. I’ll fight to keep it there!
This is somewhat akin to dealing with Omega-C by saying:
Look, I know it’s possible I’m running a deception strategy. I could spend a bunch of energy trying to suss it out as a costly signal that it’s not there. But at a policy level I’m just not going to do that. Not because I have evidence that I’m not following up on, but because I don’t want to add stress to myself in the world where I really am self-deceiving. Since I’m doing this regardless of whether the deception strategy is running, it’s not information about whether I’m secretly trying to two-box.
This relieves pressure. If I have some sense that I’m self-deceiving, and my attitude is to back the deception instead of trying to penetrate it, then the hidden part of me running the deception doesn’t have to engage in an internal arms race with me. We become same-sided.
Look away when directed to
Once I really back my own self-deception, it becomes easier to notice signs I’m doing it.
This works way better if I trust my occlumency skills here. If I don’t feel like I have to reveal the self-deceptions I notice to others, and I trust that I can and will hide it from others if need be, then I’m still safe from hostile telepaths.
Seeing where I self-deceive doesn’t mean I see what the deception is. In practice it’s more indirect than that. What I mean are things like:
Revealed preferences. (Akin to noticing I two-boxed “by accident”.)
My mind suddenly going foggy.
Forgetting what I was thinking about.
Mental chatter getting loud.
Suddenly being very disinterested in what I’m focused on.
Getting abruptly absorbed in something unrelated.
My attention scattering.
Losing awareness of my body, or parts of my body, or my body drives (like hunger).
Body activation signs: holding my breath, tensing my shoulders, quickened speech, etc.
Energy crash or getting really sleepy. (Like a freeze response.)
A sudden addictive impulse.
I feel shame, inadequacy, or otherwise think I’m broken or flawed or bad in some way.
Etc.
I don’t mean this as an exhaustive list. Nor do I mean it as things to look out for. Nor do I mean that these always imply that self-deception is going on.
What I mean is, there are things a person does to maintain self-deception. If you basically promise the strategic not-conscious-to-you part that you really will respect the strategy, then it doesn’t have to keep you so firmly out of the loop. Then you can potentially start picking up on some signposts like these ones.
Part of the deal is, when you notice such a possible signpost, you look away. You notice it and you drop the inquiry. Because until you have a non-self-deceptive strategy for whatever the real problem is, you don’t want to break the one strategy you have.
For instance, sometimes I’ll think about responding to an email… and I start getting sleepy. If I push, I start wanting to watch YouTube. These are signs that something in me doesn’t trust it’s safe for me to look there. Maybe it involves a decision that requires me to ask myself an unsafe question. I don’t know — and I don’t try to figure it out. At least not right away. Instead I back off and direct my attention elsewhere. Maybe I go cook something, or take a walk. I consciously distract myself from the tension point.
In my experience, this alone can often eliminate most of the stress involved in self-deception. It becomes fine. Annoying, glitchy, but no longer fraught with anxiety and self-doubt.
Hypothesize without checking
After a while I kind of get a “negative space” sense of what the self-deception is about. I continue not to look, out of something like respect. But I still have a hint.
Like if there’s an email I keep freezing around. I can tell there’s something there. I might even have some intuitive guesses about what it is!
…but I do not check. I don’t introspect on whether my guesses feel right.
Instead, I hypothesize. What hostile telepath problem might someone in my shoes be trying to solve such that this behavior arises?
For instance, let’s suppose the person is asking for me to run an event this weekend. I might hypothesize like this, intentionally referring to myself in third person:
Maybe Valentine doesn’t actually want to do it, but he’s scared that letting them know will make them think he’s actually uninterested in them in general, which might have them closing opportunities he wants with them in the future.
Importantly, I am not introspectively checking. I’m not asking if I think the above really is what’s going on with me. I’m just noticing that, viewing myself in third person, this model does seem to fit the evidence.
I’m also not trying to construct a plan to verify what’s going on! Here Nature wants her secrets kept. I do not try to peek under her skirt.
Instead, I notice what Valentine (i.e., me in third person) in this hypothetical could maybe do instead of Newcomblike self-deception. What would be a viable alternative strategy for him?
Maybe Valentine could meditate on their possible disapproval, and come up with a plan for what happens next in which he’s okay. (Gaining independence.)
At this point I could just implement this possible solution. I don’t have to check if it’s relevant to my situation: there’s not much cost in leaving myself a line of retreat this way.
If it turns out there’s been Newcomblike self-deception going on, and if this hypothetical solution really did resolve the core problem that the self-deception was solving, then the self-deception should basically just lift.[7]
And if I still have an ugh field around the email, then I haven’t addressed the real problem yet. Which is fine. Not ideal, but I’m still going to back any self-deception that might be there while I don’t have a better option!
I can repeat this process. Hypothesize without checking, implement solutions that would work in the hypothetical, and find out what happens.
…at least unless and until I start getting frozen about this process. That might mean I’m getting too close to understanding the strategy before it’s safe to do so.
Then I back off.
Does this solve self-deception?
I don’t know.
I didn’t originally set out to make sense of self-deception. I was just trying to understand why people sometimes view themselves as flawed and in need of fixing.
It just turned out that that question was tied to a lot of others. Self-deception being one of them. A lot of them unified by considering the problem of hostile telepaths.
It seems worth noting that a bunch of the method I describe here — particularly the “hypothesize without checking” part — is derived. It amounted to a prediction that I tested and discovered worked as the model anticipated.
Likewise, occlumency being helpful. There might be other explanations for why getting better at privacy makes more thoughts thinkable. But I derived it from this one. And, again, it (anecdotally) seems to have worked as predicted.
These approaches work remarkably well on shame too, by the way. I might write a separate post on shame. Its logic is a bit different, but with a few adjustments I’ve found that shame dissolves extremely well in contact with these ideas.
With all that said, I don’t think I’m in a position to say that I’ve solved self-deception. I don’t know how I could know that. I’m not even convinced I’ve solved Newcomblike self-deception! My method seems plausibly general, but I don’t have even the sketch of a necessity argument yet.
So, more work needed.
Summary
It seems to me that self-deception is solving a real problem. If we don’t solve that real problem differently in a given instance, then in that instance we can’t stop self-deceiving.
It seems to me that the real problem is (at least sometimes) hostile telepaths.
When I view hostile telepaths as the real problem I’m trying to solve, the perspective suggests what alternative solutions might look like, and it lets me check whether a given approach even can work as a solution.
And it seems to me that when I implement those alternative solutions, the result is sometimes that self-deception visibly falls away, non-mysteriously. It becomes obvious to me what was going on, and why.
I don’t know if this model captures all cases of what we might want to call “self-deception”. Maybe it does. But my impression is that it at least captures some cases that matter, and quite a lot of them.
- ^
Note that having non-visual ways of thinking isn’t enough to know you’re not a simulation. What tells you you’re not an Omega-V simulation is that you can reason in ways that (a) cannot be derived from your visual thinking and (b) change what you in fact do.
- ^
Of course, this is something I became aware of after unraveling the structure in a few cases. It’s not something that reveals itself while the structure works.
- ^
By “psychopath” I don’t mean something derogatory. I don’t mean “bad guys”. I meant something more like “people who are naturally unconstrained by social pressures and have no qualms breaking even profound taboos if they think it’ll benefit them”.
- ^
To be clear, “hostile telepath” is a role, not an identity. Someone is a hostile telepath to you when they’re scanning your mind and you don’t trust they won’t create problems for you based on what they find. Someone being a hostile telepath is less like them being a criminal and more like them being your lover or your foe. I say this because it’s not a solution to identify “the hostile telepaths” in a community and reform or expel them; that approach is gibberish made of confused reification.
- ^
If I were carefully describing this from the outside, I’d say that your false self can’t know. “Self-deception” is really false self deception (as a strategy for deceiving hostile telepaths). The thing is, on the inside it doesn’t feel like “your false self”. That’s the whole point! I’m describing this model in a way that’s hopefully legible to the internal experience of actually running the strategy. Otherwise any instructions might make theoretical sense but won’t be actionable. Sadly, this way of talking results in some ambiguities — precisely because the whole point of the strategy is to make something difficult to see clearly. Hopefully you can correct for this confusion as needed, sort of shifting to third-person and renaming things when the theory isn’t clear.
- ^
Why? Well, you need to “be okay” with it. But you’re not. So what do you do with the fact that you’re not okay with it? Loosely speaking, you’ve just turned your own conscious mind into an internal hostile telepath!
- ^
In practice I find that not only does this work quite often, but now it sometimes works once I think of the alternative solution. I don’t always need to implement it first. It feels to me like this result comes from having built internal trust that I really can and will respect my need for some strategy.
- LLMs one-box when in a “hostile telepath” version of Newcomb’s Paradox, except for the one that beat the predictor by (6 Oct 2025 8:44 UTC; 52 points)
- 10 Types of LessWrong Post by (14 Nov 2025 7:56 UTC; 41 points)
- 's comment on Dragon Agnosticism by (17 Nov 2024 22:53 UTC; 14 points)
- 's comment on The Hidden Cost of Our Lies to AI by (7 Mar 2025 12:18 UTC; 10 points)
- 's comment on sarahconstantin’s Shortform by (10 Dec 2024 9:44 UTC; 6 points)
- 's comment on Why We Wouldn’t Build Aligned AI Even If We Could by (16 Nov 2024 22:07 UTC; 4 points)
- 's comment on Here’s the exit. by (23 Dec 2025 14:04 UTC; 3 points)
- 's comment on Ayn Rand’s model of “living money”; and an upside of burnout by (9 Dec 2024 3:54 UTC; 3 points)
- 's comment on Belief in Belief by (14 Nov 2024 17:07 UTC; 2 points)
- 's comment on Self’s Shortform by (28 Feb 2025 22:59 UTC; 1 point)
- 's comment on The Most Forbidden Technique by (17 Mar 2025 14:50 UTC; 1 point)
- 's comment on How anticipatory cover-ups go wrong by (12 Sep 2025 17:16 UTC; 1 point)
- 's comment on Integrity and accountability are core parts of rationality by (29 Sep 2025 19:26 UTC; 1 point)
I’m pretty happy with this piece. I often look back at things I wrote a year or more prior and find I’d like to approach the whole thing very differently. But I don’t feel that way at all here. I often want to point people to it as-is, usually without any caveats.
(The part I’d most rewrite is the section on how to use the framework to sometimes loosen Newcomblike self-deception. I’ve guided more people through that process since writing this article, and I’ve learned a few things about what’s helpful for folk to hear. But even that part doesn’t need substantial rewriting. I just suspect, without having given it a ton of thought, that it could be made slightly more effective for readers.)
The basic theory still seems quite solid to me. That there’s such a thing as a hostile telepath problem, and minds seek solutions to it when they encounter one, and that at least one of those solutions is a kind of self-deception, and therefore dissolving that kind of self-deception often (always?) requires having a different solution to the hostile telepath problem at hand.
I use this framework all the time in my personal debugging. It’s also a basic component in my analyses of strange mental behavior.[1] I hear people referencing the idea fairly often now in ways that make good sense to me. So I think it’s a pretty solid contribution.
I’ve also heard that this article inspired some folk to start running experiments and playing with novel psychotechnology schemes. I haven’t heard much about the results of such experiments. But it’s pretty exciting and delightful to know that the idea is of a sort that has spurred that sort of tinkering.
Today I made a few small edits to the post. I think they’re quite minor. They’re based on some comments from shortly after the post first went live that I found myself just agreeing with:
I agree with Ben Pace’s point. I think “gaining independence” is just a clearer way of pointing at that solution type than is “having power”. So I changed the name of that solution type throughout the piece.
Ninety-Three corrected me when I said in a footnote that psychopathy was a cluster B personality disorder. I’ve removed reference to “cluster B” or “personality disorder” from that footnote, instead replacing it with the clarification I gave in reply to the correcting comment.
I’m pretty confident that those edits don’t change anything meaningful about the post.
Overall: I’m very glad to have written this, and to have published it. It feels a bit odd to give my thoughts about whether it should be in the “Best of 2024”, for social reasons. But if I kind of ignore that and pretend this piece were written by someone else (to the extent that I can honestly do something like that), my sense is that it’s a very solid contender, and the main question I’d raise is how it compares to some of the other candidate posts that it would supplant.
Or rather, while that’s true, there’s a generalization I use all the time that the hostile telepath framework is an instantiation of. Hostile telepathy is one kind of social problem that can make a certain type of irrationality make sense to do if you don’t have a better solution. There are other social problems that can make some irrationality strategic too. See for instance “Social Control Disorders”. The generalization I use is something like: “If there were a social benefit to this pattern, what might it be? How might I goal-factor here once I see the possible social benefit?”
This is a very clear, well-written post. You could get the same idea from reading Deep Deceptiveness or Planecrash / Project Lawful and there’s value in that. But this gives you the idea in 5,000 words instead of 1,800,000 words, and the example hostile telepath is a mother, rather than Asmodeus or OpenAI.
In writing this review I became less happy with some of the examples. They’re clear and evocative, but some of them seem incorrect. The mother is not hostile, she is closely aligned to her child. She isn’t trying to make the 3yo press an “actually mean it” button, she’s instead pressing her own thumbs-down button on the apology and hoping that the 3yo’s brain will update in the desired way. The 3yo probably gains a tiny bit of empathy and a tiny bit of tone-control. They don’t get self-deception, that’s too complex. If the 3yo regrets breaking the glasses because it causes mom’s wrath, that is “really sorry”, not strategic misinterpretation.
The math class example also reads false to me. I have a kid who loves math and hates math class, and this does not seem like a difficult distinction to make. As a kid I remember loving to read and hating assigned reading for school. Okay, people who “hate math” are in fact ambivalent about the abstract concept of mathematics itself, which they never encounter outside of math class, which they hate. I don’t think we need to invoke self-deception here. Yes, school can suck the joy out of a topic, but that is explained by operant conditioning.
However, the other examples read true. And even if you disagree with some of the examples, I think they’re still so clear and relatable that they give a really good handle on the topic. So I now use the label of “Hostile Telepath Problem” when I think about this problem, and I thank this article for it. The AI implications follow naturally.
I just replied to another reviewer about this point. In short: I agree, I think it’s worth noticing, and I also think the point is irrelevant. The question isn’t whether the mother truly is hostile vs. aligned with the child. The question is whether the child experiences threat from an apparent telepath.
This point is related to footnote 4. I think it’s unhelpful to ask whether the mother “actually is” a hostile telepath. Hostile telepathy is about a perception someone has of another. If you perceive someone (or something) as a hostile telepath, you need some solution to that problem. One possible solution is to discover that they are not, in fact, hostile. But if you don’t converge on that solution, you’ll need some other one.
As I mentioned to the other reviewer, it stands out to me that two people both zoomed in on the same objection. I’m not sure what’s going on there. Let me know if I’ve missed your point?
Of course! My impression is that many (most?) math students don’t get sucked into the Newcomblike self-deception pattern I was naming. But some do! You’re pointing out an example of it not happening. If I were claiming that this happens for all math students, your point would totally refute mine. And to the degree you thought I was making that claim, or it came across ambiguous about whether I was, I’m glad you brought it up! But my point wasn’t that all math students encounter this. It’s that some do. And I don’t think it’s super rare.
Yep. Key to why I worked on it to begin with. I’m glad you caught that and named it!
There’s a Dog Man comic where a villain (as it happens) tells Dog Man off. Dog Man goes off and looks sad. The villain then shouts at him: “sadder!”. Dog Man “presses his ‘actually mean it’ button” and looks sadder. The joke works for me because it’s true to this relatable “say it and mean it” dynamic.
I think my reading came from this paragraph, which is expressed as facts about the hypothetical, rather than the 3yo’s experience/model of the hypothetical:
I wouldn’t edit the paragraph, because it’s clear as it is and more words would dilute. To me, it’s a “fridge logic” hypothetical which becomes less compelling on later reflection, but it still has a good impact on first reading.
With the math class example I don’t follow your reasoning. This seems common to me:
problem: the teacher wants me to try hard, but I want to slack off
idea: I will pretend to try hard, but really slack off
problem: the teacher is a telepath and can tell that I’m slacking off
idea: I will deceive myself into thinking that I am trying hard, but really slack off
This also seems common to me:
problem: I hate math class, but the teacher is personally offended by that
idea: I will deceive myself into thinking that I hate math, which will explain away my hatred of math class
But, I don’t see how they connect up in a natural way, to get all the way from the initial problem to the solution of self-deceiving to hate math. Or if they do it feels like so many layers of self-deception that it would be rare. Perhaps this is two Hostile Telepath Problems in one bullet point.
Reviewing for 2024, but I don’t recall reading it previously. I think I bounced off the name. I’ve long felt the Trivers deception theory of you deceive yourself to deceive others to be compelling and concerning. I’m a big fan of Elephant in the Brain. Whereas my recollection of that is a great collection of evidence that self-deception is at play, I think this post might be the clearest and most detailed description of the phenomenon I know of and there’s a meaningful discussion of what to do about it.
I don’t think that discussion is sufficient, but it’s something.
Plausibly linked to why I bounced off the title, I think the diagnosis of the cause is wrong – and like much self and other deception – self-flattering. Others insisting we have the correct mental state is one reason to pretend to have certain innards, but there’s no shortage of motivation to want to hostile-y deceive others about self-serving such things like I’m doing this for your benefit, I’m doing this because I’m altruistic, and so on.
My guess is self-deception is incentivized so long as other deception is incentivized, and the ugly reality is that it is typically is. Everyone has the incentive to be publicly against deception while trying to covertly defect.
Regarding the mother whose glasses broke as hostile telepath: I think a more charitable interpretation is she’s ineffective socializer. The goal is to have the child affirm they understand they caused damage, assigned negative valence to having done so, and commit not to do it again. That is a good thing to raise your children to do. There’s a sense in which the child is interfering with a beneficial process when they pretend (also a sense in which they’re protecting themselves).
I strong upvoted and probably give +9 in the review, even though I think the highest level of framing is wrong here, since the core is so good.
I think you’re probably right.
Just to be clear, I wasn’t trying to lay out a theory of all self-deception. I was trying to lay out a theory of one cause of at least one kind of self-deception (namely Newcomblike self-deception). I noticed that the problem seemed like it should be real, and it has multiple solutions, Newcomblike self-deception being one of them. That setup had some interesting logic that panned out with some casual experimentation.
I kind of wonder what portion of self-deception is entirely about dealing with hostile telepaths. Is it 100%? Probably not. My intuitive impression is that it’s well over 50% though. But even if it’s just 10% I don’t think that affects the logic of the post whatsoever. It’s a suggestion that thus-and-such type of self-deception arises from solving a particular problem that has other possible solutions, not a theory that all self-deception comes from this mechanism.
Someone else brought this up too. I think you’re right. And perhaps I wrote that part in a misleading way!
That said, I don’t think your very accurate point affects the example whatsoever.
The question isn’t about what’s really going on. The question is, from the perspective of the child, is he dealing with a hostile telepath? Which is to say, is he dealing with someone (a) who seems to be able to read his internal states and (b) whom he doesn’t trust won’t make his life worse based on what she finds? If the answer is “yes”, he’s faced with a hostile telepath problem, for which he needs some kind of solution.
It really doesn’t matter whatsoever how badly that represents the mother’s subjective state or motivations. The child doesn’t have access to that. The child just knows that Mother is mad at him, is demanding that he “be sorry”, and is checking. It’s possible the mom isn’t even mad! Maybe she’s a perfect saint bringing pure love and care and understanding while gently guiding the child to the best of her ability. But if the child perceives the mother as a hostile telepath, then he needs a solution. Which Newcomblike self-deception is one such solution.
It seems relevant that another reviewer gave the same pushback though. I wonder if I’ve been unclear, or if I’m missing something. Let me know if it seems to you like I’ve missed your point.
I continue to really like this post. I hear people referencing the concept of “hostile telepaths” in conversation sometimes, and they’ve done it enough that I forgot it came from this post! It’s a useful handle for a concept that, in particular, can be difficult for the type of person who is likely to read LessWrong to deal with because they themselves lack strong, detailed models of how others think, and so while they can feel the existence of hostile telepaths, lack a theory of what’s going on (or at least did until this post explained it).
Similar in usefulness to Aella’s writing about frame control.
I really like this post, it outlines the fact that we all self-deceive, and uses an excellent example from literature (often rationalist literature) to encourage us to consider this fact. It has made me kinder to myself when I find a self-deception, and the examples you gave have helped me gently tease apart why I might be performing occlumency.
One of my immediate initial responses to this idea was “doesn’t this just discourage you from finding out areas of inefficiency? sounds like a bad idea to me!” but you tied in your reasoning to power and ability to act safely, and—to me—that perfectly packaged the whole concept; it is a great insight to realise that the places I reflexively flinch or turn away from are the same that I am working on unconsciously, and that this “turning away” is a self-preservation mechanism that I would do best to respect and gently hypothesise (without checking) around.
On that note, this post makes real and tacit the usually very hidden machinations of the unconscious, which has been studied at length, and is part of the reason why I think you “hit the nail on the head”, so to speak.
Thank you for this post, I would indeed love to see a further post on shame, as I can see how that ties in but would love to see your insight.
Nominated. The hostile telepath problem immediately entered my library of standards hypothesis to test for debugging my behavior and helping others do so, and sparked many lively conversations in my rationalist circles.
I’m glad I reread it today.