Another possibility is that verbalized eval awareness goes up as an evaluation gets more realistic, because the model has to reason about if it’s in an eval vs just knows without needing to reason about it if the env is sufficiently toy.
You might (rightly) quibble that the green line could be above the red line at various points, if the model reasons that it could be in an eval and then decides that it isn’t.
Another possibility is that verbalized eval awareness goes up as an evaluation gets more realistic, because the model has to reason about if it’s in an eval vs just knows without needing to reason about it if the env is sufficiently toy.
You might (rightly) quibble that the green line could be above the red line at various points, if the model reasons that it could be in an eval and then decides that it isn’t.