This was a helpful way to understand the differences (especially requiring authentication or knowing it’s really the principal is something I haven’t thought about much).
There’s an odd example where you might have someone like a CEO strike a deal with a misaligned/scheming model such that for some period of time/condition, the model is effectively secretly loyal despite it not having some weight-encoded “goal” or propensity to favor the principal.
This was a helpful way to understand the differences (especially requiring authentication or knowing it’s really the principal is something I haven’t thought about much).
There’s an odd example where you might have someone like a CEO strike a deal with a misaligned/scheming model such that for some period of time/condition, the model is effectively secretly loyal despite it not having some weight-encoded “goal” or propensity to favor the principal.