peterbarnett comments on Caleb Biddulph’s Shortform

peterbarnett 28 May 2025 16:49 UTC
8 points
2
I get this with 4o, but not o3. o3 talks about sycophancy in both its CoT and its answers.

Claude 4 Sonnet and Opus also easily talk about sycophancy.
- peterbarnett 28 May 2025 18:58 UTC
  4 points
  0
  Parent
  Update: 4o seems happy to talk about sycophancy now
  - Caleb Biddulph 28 May 2025 19:39 UTC
    3 points
    0
    Parent
    It looks like the bias is still in effect for me in GPT-4o. I just retried my original prompt, and it mentioned “Synergistic Deceptive Alignment.”
    The phenomenon definitely isn’t consistent. If it’s very obvious that “sycophancy” must appear in the response, the model will generally write the word successfully. Once “sycophancy” appears once in the context, it seems like it’s easy for the model to repeat it.