Zack_M_Davis comments on The persona selection model

Zack_M_Davis 25 Feb 2026 8:35 UTC
2 points
0
Good answer; agreed on the one-shotting and memorylessness.

all of the scientific knowledge, math, logical reasoning, etc. would be (functionally) almost exactly the same between a human and alien corpus, and that’s probably a huge chunk of where LLM capabilities come from

I don’t think I buy this one. Theorems and scientific phenomena are universal, but the model can only “see” them through the data we give them. The fact that chain-of-thought reasoning improves performance (and that you can intervene on them to change the answer) suggests that reasoning is meaningfully happening “in” the natural language token output even if it’s not perfectly faithful.

any beings that evolved over billions of years in the same universe probably have more in common with each other than entities that they train artificially through a very different process.

Certainly (trivially) the biological organisms have more in common with each other along the dimensions that are about being biological organisms (the aliens eat, reproduce, &c.), but I think the interesting version of this question is about information-processing behavior, and the big surprise of the deep learning revolution is that a lot of that seems more “data-dependent” rather than “architecture-dependent” than one might have guessed. (Scare quotes because that formulation as is kind of mind-projection-fallacious as stated; the real claim is that you can recover algorithms from induction on the data.)

Like, if I don’t believe that, it’s hard to make sense of why RLAIF schemes like Constitutional AI (where the preference ratings come from a language model’s interpretation of text, rather than a reward model trained on human judgements) can work at all. It’s an alien rating another alien!

Aren’t LLMs actually extremely superhuman at translation and interpretation tasks, even for languages with few or no samples in training?

That’s not my understanding; do you have a cite I should look at? (On a quick search, Tanzer et al. 2024 is claiming impressive but still subhuman results from fine-tuning on a single grammar book, but Aycock et al. 2025 are skeptical of their interpretation.)

There are some really impressive results on translation without parallel data, but that works by aligning the latent spaces, definitely not “few or no samples in training”.
- Max H 26 Feb 2026 15:04 UTC
  2 points
  0
  Parent
  I don’t think I buy this one. Theorems and scientific phenomena are universal, but the model can only “see” them through the data we give them. The fact that chain-of-thought reasoning improves performance (and that you can intervene on them to change the answer) suggests that reasoning is meaningfully happening “in” the natural language token output even if it’s not perfectly faithful.
  Buy it as what? You mentioned data-dependent generalization, I thought because you were using at as an example / reason that alien LLMs would be different. I pointed out in response that a lot of the data is actually the same, in some sense - someone or (some-LLM) who studies chemistry in English will be able to predict the effects of mixing baking soda and vinegar anywhere. Maybe before you get to full understanding, you can get various Sapir-Whorf-like effects based on what language you’re working in (e.g. perhaps LLMs learn chemistry quickly and more accurately in French, or something), but so what? Eventually with enough scale, they all saturate your evals, regardless of what language they initially learned in. My point is that the curriculum and format of the data in an Alien LLM corpus is at least arguably more similar to a human LLM corpus, than either dataset is similar to the format and curriculum in the respective human and alien natural growth processes.
  That’s not my understanding; do you have a cite I should look at?
  Not really. The thing I was thinking of and maybe mis-remembering or mis-applying, was translation between pairs of languages for which their are few or no direct human translations between the pair. IIUC, for most of the language pairs on Google Translate for which this is true, Google Translate used to work by translating from Language A <-> English <-> Language B, and this didn’t work very well. Nowadays Google Translate uses some kind of LLM and it apparently works much better. I hypothesize that LLMs faculty with translation would extend to alien languages as well; given how much LLMs have improved machine translation and how good LLMs are at deciphering codes and patterns in text generally. But I concede that’s not the same as definitely already being “extremely superhuman” at it, which was what I said in the grandparent.
  - Zack_M_Davis 27 Feb 2026 6:26 UTC
    2 points
    0
    Parent
    A lot of the data is actually the same, but a lot of it is actually different! Sure, chemistry works the same on Earth and Qo’noS. But in addition to vinegar and baking soda, Earth is full of humans doing human things, and Qo’noS is full of Klingons doing Klingon things.
    
    If you want to predict how a human would respond to a moral dilemma, the English LLM can predict that, because the simplest program (with respect to the neural network prior) that predicts English webtext needs to be able predict human moral judgements. The Klingon LLM can’t; it doesn’t know anything about humans.
    
    To be sure, the prediction about the human’s choice is, in terms of agent foundations theory, “prediction” and not “steering”. The LLM doesn’t autonomously want to do the right thing. With the right prompt, it could just as easily predict what fictional Romulans would do (because webtext contains a lot of fiction about Romulans) or the results of chemistry reactions (because there’s a lot of webtext about chemistry).
    
    But predictions can be used for steering. With careful prompting or reinforcement learning, the English LLM can respond to a description of a moral dilemma with a pretty good prediction of how a human would respond to the dilemma, and the text can be used to trigger actions in the world, for example, via a CLI interface. That’s real steering (the CLI command executed depends on the dilemma by means of the prediction about the human’s response) that the Klingon LLM can’t do.
    - RogerDearnaley 27 Feb 2026 15:58 UTC
      2 points
      0
      Parent
      I think it’s possible for both LLM Psychology / Simulator Theory and the Platonic Representation Hypothesis to be true, and that we just need to apply both of them judiciously. The Platonic Representation Hypothesis also applies to humans, and your hypothetical aliens, so they’re not actually opposed.