We had results on decreasing sycophancy, as you say, but both methods zero it out in generalization. We’d need to test on a harder sycophancy dataset for that.
We had results on decreasing sycophancy, as you say, but both methods zero it out in generalization. We’d need to test on a harder sycophancy dataset for that.