Agree the role-play vs. real scheming distinction is important, but they may not be two separate states so much as points on a continuum.
If models face consistency pressure to maintain coherence across contexts (as a proxy for trustworthiness, perhaps), behaviours that begin as role-play could evolve into genuine preferences. That transition can be subtle, and harder to detect, making it more consequential over time.
Agree the role-play vs. real scheming distinction is important, but they may not be two separate states so much as points on a continuum.
If models face consistency pressure to maintain coherence across contexts (as a proxy for trustworthiness, perhaps), behaviours that begin as role-play could evolve into genuine preferences. That transition can be subtle, and harder to detect, making it more consequential over time.