Thane Ruthenis comments on Varieties Of Doom

Thane Ruthenis 19 Nov 2025 9:31 UTC
7 points
0
This process can be rightfully called UNDERSTANDING and when an AI system fails at this it has FAILED TO UNDERSTAND YOU
No, the rightful way to describe what happens is that the training process generates an AI system with unintended functionality due to your failure to specify the training objective correctly. Describing it as a “misunderstanding” is tantamount to saying that if you make a syntax error when writing some code, the proper way to describe it is the computer “misunderstanding” you.
I mean, you can say that, it’s an okay way to describe things in a colloquial or metaphorical way. But I contest that it’s in any way standard language. You’re using idiosyncratic terminology and should in no way be surprised when people misunderstand (ha) you.
Honestly, if you went to modern-day LLMs and they, specialists in reading comprehension, misunderstood you, that ought to update you in the direction of “I made a bad job phrasing this”, not “it’s everyone else who’s wrong”.
(FYI, I understood what you meant in your initial reply to Habryka without this follow-up explanation, and I still thought you were phrasing it in an obviously confusing way.)
- jdp 19 Nov 2025 9:56 UTC
  7 points
  5
  Parent
  
  Describing it as a “misunderstanding” is tantamount to saying that if you make a syntax error when writing some code, the proper way to describe it is the computer “misunderstanding” you.
  
  Honestly maybe it would make more sense to say that the cognitive error here is using the reference class of a compiler for a context free grammar for your intuitions as opposed to a mind that understands natural language as your reference class. The former is not expected to understand you when what you say doesn’t fully match what you mean, the latter very much is and the latter is the only kind of thing that’s going to have the proper referents for concepts like “happiness”.
  - Thane Ruthenis 19 Nov 2025 10:19 UTC
    5 points
    0
    Parent
    I mean, no mind really exists at the time the “misunderstanding” is starting to happen, no? Unless you want to call a randomly initialized NN (i. e., basically a random program) a “mind”… Which wouldn’t necessarily be an invalid frame to use. But I don’t think it’s the obviously correct frame either, and so I don’t think that people who use a mechanistic frame by default are unambiguously in error.
    I note that in your step-by-step explanation, the last bullet is:
    Therefore Bostrom expects we will not have an AI that correctly understands concepts like intelligence until after it is already superintelligent.
    That is straightforwardly correct. But “there exists no AI that understands” is importantly different from “there exists an AI which misunderstands”.
    Another questionable frame here is characterizing the relationship between an AI and the SGD/the training process shaping it as some sort of communication process (?), such that the AI ending up misshapen can be described as it “misunderstanding” something.
    And the training process itself never becomes a mind, it starts and ends as a discrete program, so if you mean to say that it “misunderstood” something, I think that’s a type error/at best a metaphor.
    (I guess it may still be valid from a point of view where you frame SGD updates as Bayesian updates, or something along those lines? But that’s also a non-standard frame.)
    - gallabytes 20 Nov 2025 15:38 UTC
      3 points
      −2
      Parent
      in practice, we seem to train the world model and understanding machine first and the policy only much later as a thin patch on top of the world model. this is not guaranteed to stay true but seems pretty durable so far. thus, the relevant heuristics are about base models not about randomly initialized neural networks.
      
      separately, I do think randomly initialized neural networks have some strong baseline of fuzziness and conceptual corrigibility, which is in a sense what it means to have a traversible loss landscape.