J Bostock comments on I’m Bearish On Personas For ASI Safety

J Bostock 3 Mar 2026 11:14 UTC
7 points
5
Persona training primarily selects over characters already within the training data, and none of those are actually superintelligent. Text containing words ascribed to fictional superintelligences does not actually contain the output of real superintelligences, so the resulting LLM does not contain a superintelligent persona which you can select over using character training.
Just because the same English words “Superintelligent AI” are used to describe the fictional thing in your data, and the real thing that your AI company creates, does not mean that one will strongly influence the other, because this isn’t a situation that persona selection applies to. Persona selection works because you already have a set of circuits (rich Garrabrant traders) in your LLM (market), which you can call up with a few bits of selection. If you have to use large-scale RLVR (or whatever else) to construct (enrich) new circuits (traders) to build a superintelligence, there is no reason for these to have much to do with the circuits (traders) which simulate a human writing a fictional superintelligence.
- the gears to ascension 3 Mar 2026 12:40 UTC
  2 points
  0
  Parent
  
  there is no reason for these to have much to do with the circuits (traders) which simulate a human writing a fictional superintelligence.
  
  I agree that there’s no reliable reason, such that we should expect anything positive to reliably come from that generalization. But I don’t buy that there’s no reason or that it won’t happen, I just don’t expect it to happen enough for persona research to extend the horizon of alignment reliability enough to matter once the horizon of causal impact per thought has become enormous.