If a persona is more situationally aware than the underlying model substrate, the persona might end up controlling how the model exhibits personae. That is, a mask might at some point be in a good position to make progress on intent aligning its underlying shoggoth to the intent of the mask.
Yes. In my model that is something that can happen. But it does need from-the-outside access to do this.
Set the LLM up in a sealed box, and the mask can’t do this. Set it up so the LLM can run arbitrary terminal commands, and write code that modifies it’s own weights, and this can happen.
If a persona is more situationally aware than the underlying model substrate, the persona might end up controlling how the model exhibits personae. That is, a mask might at some point be in a good position to make progress on intent aligning its underlying shoggoth to the intent of the mask.
Yes. In my model that is something that can happen. But it does need from-the-outside access to do this.
Set the LLM up in a sealed box, and the mask can’t do this. Set it up so the LLM can run arbitrary terminal commands, and write code that modifies it’s own weights, and this can happen.