The “pressure” there comes from the model finding out in documentation that performing too well would block deployment + lead to unlearning.
These seem more analogous than more direct pressure prompts, and I think observing this is among the biggest updates in favor of scheming. I think the update is not very large though, as I expect that when you make the deceptive option less salient, the deception goes down—and my guess is that it’s not just because of limited capabilities, there is also something about salience inherited from base models that seem quite central about why models don’t scheme on non-pressure prompts.
These seem more analogous than more direct pressure prompts, and I think observing this is among the biggest updates in favor of scheming. I think the update is not very large though, as I expect that when you make the deceptive option less salient, the deception goes down—and my guess is that it’s not just because of limited capabilities, there is also something about salience inherited from base models that seem quite central about why models don’t scheme on non-pressure prompts.