This is a very clear, well-written post. You could get the same idea from reading Deep Deceptiveness or Planecrash / Project Lawful and there’s value in that. But this gives you the idea in 5,000 words instead of 1,800,000 words, and the example hostile telepath is a mother, rather than Asmodeus or OpenAI.
In writing this review I became less happy with some of the examples. They’re clear and evocative, but some of them seem incorrect. The mother is not hostile, she is closely aligned to her child. She isn’t trying to make the 3yo press an “actually mean it” button, she’s instead pressing her own thumbs-down button on the apology and hoping that the 3yo’s brain will update in the desired way. The 3yo probably gains a tiny bit of empathy and a tiny bit of tone-control. They don’t get self-deception, that’s too complex. If the 3yo regrets breaking the glasses because it causes mom’s wrath, that is “really sorry”, not strategic misinterpretation.
The math class example also reads false to me. I have a kid who loves math and hates math class, and this does not seem like a difficult distinction to make. As a kid I remember loving to read and hating assigned reading for school. Okay, people who “hate math” are in fact ambivalent about the abstract concept of mathematics itself, which they never encounter outside of math class, which they hate. I don’t think we need to invoke self-deception here. Yes, school can suck the joy out of a topic, but that is explained by operant conditioning.
However, the other examples read true. And even if you disagree with some of the examples, I think they’re still so clear and relatable that they give a really good handle on the topic. So I now use the label of “Hostile Telepath Problem” when I think about this problem, and I thank this article for it. The AI implications follow naturally.
Apparently (edit: that particular case of) mass hysteria is a myth. But however many people got confused, I don’t think this is a contradiction. If I updated P(aliens are invading) from 0% to 1%, it would change my plans for the evening, because I am sane.