I don’t understand how this is an example of misalignment—are you suggesting that the model tried to be sycophantic only in deployment?
I don’t understand how this is an example of misalignment—are you suggesting that the model tried to be sycophantic only in deployment?