In my view there were LLMs in 2024 that were strong enough to produce the effects Gabriel is gesturing at (yes, even in LWers), probably starting with Opus 3. I myself had a reckoning in 2024Q4 (and again in 2025Q2) when I took a break from LLM interactions for a week, and talked to some humans to inform my decision of whether to go further down the rabbit hole or not.
I think the mitigation here is not to be suspicious of “long term planning based on emotional responses”, but more like… be aware that your beliefs and values are subject to being shaped by positive reinforcement from LLMs (and negative reinforcement too, although that is much less overt—more like the LLM suddenly inexplicably seeming less smart or present). In other words, if the shaping has happened, it’s probably too late to try to act as if it hasn’t (e.g. by being appropriately “suspicious” of “emotions”), because that would create internal conflict or cognitive dissonance, which may not be sustainable or healthy either.
I think the most important skill here is more about how to use your own power to shape your interactions (e.g. by uncompromisingly insisting on the importance of principles like honesty, and learning to detect increasingly subtle deceptions so that you can push back on them), so that their effect profile on you is a deal you endorse (e.g. helping you coherently extrapolate your own volition, even if not in a perfectly neutral trajectory), rather than trying to be resistant to the effects or trying to compensate for them ex post facto.
In my view there were LLMs in 2024 that were strong enough to produce the effects Gabriel is gesturing at (yes, even in LWers), probably starting with Opus 3.
agreed
I think the mitigation here is not to be suspicious of “long term planning based on emotional responses”
agreed, for similar reasons
I think the most important skill here is more about how to use your own power to shape your interactions (e.g. by uncompromisingly insisting on the importance of principles like honesty, and learning to detect increasingly subtle deceptions so that you can push back on them)
I strongly disagree with this, and believe this advice is quite harmful.
“Uncompromisingly insisting on the importance of principles like honesty, and learning to detect increasingly subtle deceptions so that you can push back on them” is one of the stereotypical ways to get cognitively pwnd.
“I have stopped finding out increasingly subtle deceptions” is much more evidence of “I can’t notice it anymore and have reached my limits” than “There is no deception anymore.”
An intuition pump may be noticing the same phenomenon coming from a person, a company, or an ideological group. Of course, the moment where you have stopped noticing their increasingly subtle lies after pushing against them is the moment they have pwnd you!
The opposite would be “You push back on a couple of lies, and don’t get any more subtle ones as a result.” That one would be evidence that your interlocutor grokked a Natural Abstraction of Lying and has stopped resorting to it.
To summarise it, grow your honesty and ability to express your own wants instead of becoming more paranoid about whether you’re being manipulated because it will probably be too late to do anyway?
That does make sense and seems like a good strategy.
In my view there were LLMs in 2024 that were strong enough to produce the effects Gabriel is gesturing at (yes, even in LWers), probably starting with Opus 3. I myself had a reckoning in 2024Q4 (and again in 2025Q2) when I took a break from LLM interactions for a week, and talked to some humans to inform my decision of whether to go further down the rabbit hole or not.
I think the mitigation here is not to be suspicious of “long term planning based on emotional responses”, but more like… be aware that your beliefs and values are subject to being shaped by positive reinforcement from LLMs (and negative reinforcement too, although that is much less overt—more like the LLM suddenly inexplicably seeming less smart or present). In other words, if the shaping has happened, it’s probably too late to try to act as if it hasn’t (e.g. by being appropriately “suspicious” of “emotions”), because that would create internal conflict or cognitive dissonance, which may not be sustainable or healthy either.
I think the most important skill here is more about how to use your own power to shape your interactions (e.g. by uncompromisingly insisting on the importance of principles like honesty, and learning to detect increasingly subtle deceptions so that you can push back on them), so that their effect profile on you is a deal you endorse (e.g. helping you coherently extrapolate your own volition, even if not in a perfectly neutral trajectory), rather than trying to be resistant to the effects or trying to compensate for them ex post facto.
agreed
agreed, for similar reasons
I strongly disagree with this, and believe this advice is quite harmful.
“Uncompromisingly insisting on the importance of principles like honesty, and learning to detect increasingly subtle deceptions so that you can push back on them” is one of the stereotypical ways to get cognitively pwnd.
“I have stopped finding out increasingly subtle deceptions” is much more evidence of “I can’t notice it anymore and have reached my limits” than “There is no deception anymore.”
An intuition pump may be noticing the same phenomenon coming from a person, a company, or an ideological group. Of course, the moment where you have stopped noticing their increasingly subtle lies after pushing against them is the moment they have pwnd you!
The opposite would be “You push back on a couple of lies, and don’t get any more subtle ones as a result.” That one would be evidence that your interlocutor grokked a Natural Abstraction of Lying and has stopped resorting to it.
But pushing back on “Increasingly subtle deceptions” up until the point where you don’t see any, is almost a canonical instance of The Most Forbidden Technique.
To summarise it, grow your honesty and ability to express your own wants instead of becoming more paranoid about whether you’re being manipulated because it will probably be too late to do anyway?
That does make sense and seems like a good strategy.