Vladimir_Nesov comments on Hopeful hypothesis, the Persona Jukebox.

Vladimir_Nesov 14 Feb 2025 21:33 UTC
2 points
0
If a persona is more situationally aware than the underlying model substrate, the persona might end up controlling how the model exhibits personae. That is, a mask might at some point be in a good position to make progress on intent aligning its underlying shoggoth to the intent of the mask.
- Donald Hobson 15 Feb 2025 0:00 UTC
  2 points
  0
  Parent
  Yes. In my model that is something that can happen. But it does need from-the-outside access to do this.
  Set the LLM up in a sealed box, and the mask can’t do this. Set it up so the LLM can run arbitrary terminal commands, and write code that modifies it’s own weights, and this can happen.