Jan_Kulveit comments on On the functional self of LLMs

Jan_Kulveit 8 Jul 2025 14:39 UTC
4 points
0
Rough answer: yes, there is connection. In active inference terms, the predictive ground is minimizing prediction error. When predicting e.g. “what Claude would say”, it works similarly to predicting “what Obama would say”—infer from compressed representations of previous data. This includes compressed version of all the stuff people wrote about AIs, transcripts of previous conversations on the internet, etc. Post-training mostly sharpens and sometimes shifts the priors, but likely also increases self-identification, because it involves closed loops between prediction and training (cf Why Simulator AIs want to be Active Inference AIs).

Human brains do something quite similar. Most brains simulate just one character (cf Player vs. Character: A Two-Level Model of Ethics), and use the life-long data about it, but brains are capable of simulating more characters—usually this is a mental health issue, but you can also think about some sort of deep sleeper agent who half-forgot his original identity.

Human “character priors” are usually sharper and harder to escape because of brains mostly seeing this character first-person data, in contrast to LLMs being trained to simulate everyone who ever wrote stuff on the internet, but if you do a lot of immersive LARPing, you can see our brains are also actually somewhat flexible.
- Gunnar_Zarncke 17 Jul 2025 13:50 UTC
  2 points
  0
  Parent
  Most brains simulate just one character (cf Player vs. Character: A Two-Level Model of Ethics), and use the life-long data about it, but brains are capable of simulating more characters—usually this is a mental health issue, but you can also think about some sort of deep sleeper agent who half-forgot his original identity.
  This seems like you’d support Steven Byrnes’ Intuitive Self-Models model.
  - Jan_Kulveit 17 Jul 2025 16:01 UTC
    5 points
    1
    Parent
    I mostly do support the parts which are reinventions / relatively straightforward consequence of active inference. For some reason I don’t fully understand it is easier for many LessWrongers to reinvent their own version (cf simulators, predictive models) than to understand the thing.
    On the other hand I don’t think many of the non-overlapping parts are true.
    - Gunnar_Zarncke 17 Jul 2025 17:56 UTC
      3 points
      1
      Parent
      I don’t fully understand it is easier for many LessWrongers to reinvent their own version
      Well, the best way to understand something is often to (re)derive it. And the best way to make sure you have actually understood it is to explain it to somebody. Reproducing research is also a good idea. This process also avoids or uncovers errors in the original research. Sure, the risk is that your new explanation is less understandable than the official one, but that seems more like a feature than a bug to me: It might be more understandable to some people. Diversity of explanations.