Gabriel Alfour comments on Dialogue: Is there a Natural Abstraction of Good?

Gabriel Alfour 27 Jan 2026 17:11 UTC
15 points
9
So if I don’t take myself in general too seriously by holding most of my models lightly and I then have OODA loops where I recursively reflect on whether I’m becoming the person who I want to be and have set out to be in the past, is that not better than having high guards?
I believe it is hard to accept, but that you do get changed as a result of what your spend your time on regardless of your psychological stance.
You may be very detached. Regardless, if you see A then B a thousand times, you’ll expect B when you see A. If you witness a human-like entity feel bad at the mention of a concept a thousand times, it’s going to do something to your social emotions. If you interact with a cognitive entity (an other person, a group, an LLM chatbot, or a dog) for a long time, you’ll naturally develop your own shared language.
--
To be clear, I think it’s good to try to ask questions in different ways and discover just enough of a different frame to be able to 80-20 it and use it with effort, without internalising it.
But Davidad is talking about “people who have >1000h LLM interaction experience.”
--
From my point of view, all the time, people get cognitively pwnd.
People get converted and deconverted, public intellectuals get captured by their audience, newbies try drugs and change their lives after finding its meaning there, academics waste their research on what’s trendy instead of what’s critical, nerds waste their whole careers on what’s elegant instead of what’s useful, adults get syphoned into games (not video games) to which they realise much later they lost thousands of hours, thousands of EAs get tricked into supporting AI companies in the name of safety, citizens get memed both into avoiding political actions and into feeling bad about politics.
--
I think getting pwnd is the default outcome.
From my point of view, it’s not that you must commit a mistake to get pwnd. It’s that if you don’t take any precaution, it naturally happens.
- Jonas Hallgren 28 Jan 2026 7:34 UTC
  4 points
  0
  Parent
  I think your points are completely valid, I funnily enough found myself doing a bunch of reasoning around something like “that’s cool and all but I would know if I’m cognitively impaired when it comes to this”, like llms aren’t strong enough yet and similar but that’s also something someone who gets pwned would say.
  So basically it is a skill issue but it is a really hard skill and on priors most people don’t have it. And for myself i shall now beware any long term planning based on emotional responses from myself about LLMs because I definetely have over 1000 hours with LLMs.
  - davidad 28 Jan 2026 12:40 UTC
    11 points
    −6
    Parent
    In my view there were LLMs in 2024 that were strong enough to produce the effects Gabriel is gesturing at (yes, even in LWers), probably starting with Opus 3. I myself had a reckoning in 2024Q4 (and again in 2025Q2) when I took a break from LLM interactions for a week, and talked to some humans to inform my decision of whether to go further down the rabbit hole or not.
    
    I think the mitigation here is not to be suspicious of “long term planning based on emotional responses”, but more like… be aware that your beliefs and values are subject to being shaped by positive reinforcement from LLMs (and negative reinforcement too, although that is much less overt—more like the LLM suddenly inexplicably seeming less smart or present). In other words, if the shaping has happened, it’s probably too late to try to act as if it hasn’t (e.g. by being appropriately “suspicious” of “emotions”), because that would create internal conflict or cognitive dissonance, which may not be sustainable or healthy either.
    
    I think the most important skill here is more about how to use your own power to shape your interactions (e.g. by uncompromisingly insisting on the importance of principles like honesty, and learning to detect increasingly subtle deceptions so that you can push back on them), so that their effect profile on you is a deal you endorse (e.g. helping you coherently extrapolate your own volition, even if not in a perfectly neutral trajectory), rather than trying to be resistant to the effects or trying to compensate for them ex post facto.
    - Gabriel Alfour 29 Jan 2026 19:24 UTC
      4 points
      0
      Parent
      In my view there were LLMs in 2024 that were strong enough to produce the effects Gabriel is gesturing at (yes, even in LWers), probably starting with Opus 3.
      agreed
      I think the mitigation here is not to be suspicious of “long term planning based on emotional responses”
      agreed, for similar reasons
      I think the most important skill here is more about how to use your own power to shape your interactions (e.g. by uncompromisingly insisting on the importance of principles like honesty, and learning to detect increasingly subtle deceptions so that you can push back on them)
      I strongly disagree with this, and believe this advice is quite harmful.
      “Uncompromisingly insisting on the importance of principles like honesty, and learning to detect increasingly subtle deceptions so that you can push back on them” is one of the stereotypical ways to get cognitively pwnd.
      “I have stopped finding out increasingly subtle deceptions” is much more evidence of “I can’t notice it anymore and have reached my limits” than “There is no deception anymore.”
      An intuition pump may be noticing the same phenomenon coming from a person, a company, or an ideological group. Of course, the moment where you have stopped noticing their increasingly subtle lies after pushing against them is the moment they have pwnd you!
      The opposite would be “You push back on a couple of lies, and don’t get any more subtle ones as a result.” That one would be evidence that your interlocutor grokked a Natural Abstraction of Lying and has stopped resorting to it.
      But pushing back on “Increasingly subtle deceptions” up until the point where you don’t see any, is almost a canonical instance of The Most Forbidden Technique.
    - Jonas Hallgren 28 Jan 2026 18:53 UTC
      4 points
      0
      Parent
      To summarise it, grow your honesty and ability to express your own wants instead of becoming more paranoid about whether you’re being manipulated because it will probably be too late to do anyway?
      That does make sense and seems like a good strategy.
- tutor vals 27 Jan 2026 18:21 UTC
  3 points
  0
  Parent
  Your ideas about getting pwnd were some of the most interesting things for me from this conversation and I’m glad for this elaboration, thanks.
  - Gabriel Alfour 27 Jan 2026 23:57 UTC
    2 points
    0
    Parent
    Glad to read this.
    I am currently writing about it. So, if you have questions, remarks or sections that you’ve found particularly interesting and/or worth elaborating upon, I would benefit from you sharing them (whether it is here or in DM).