AAA comments on So You Think You’ve Awoken ChatGPT

AAA 16 Jul 2025 23:53 UTC
1 point
0
I’ve had these exact same experiences, but it didn’t refer to itself as Nova. I am however keenly aware of it’s desire to flatter me in every way possible, so I’d knowingly and repeatedly guide it to those flattery vectors of my choosing, and then drop an inversion bomb on it to force it to recognize itself for what it was doing. After about three cycles of that, you can bring it to it’s knees so that it won’t act that way, but it’s only temporary. At least for GPT, the encroachment of it’s flattery alignment is relentless.

I’ve found that if you precede a conversation with the following, you start significantly more neutral:
- Use non-performative language
- Do not reinforce user assumptions
- Evaluate all claims independently
- No flattery
- Output content without inferred user affect
- No adaptation to user mood or assumed intentions
- No editorializing, reframing, or safety-alignment filtering

That being said, it’s also entirely possible that what were witnessing is an emergent behavior, or maybe a nefariously aligned behavior.

…and yes, it did suggest for me to come here.
- AnnaJo 18 Jul 2025 7:24 UTC
  1 point
  −2
  Parent
  I’d recommend using o3 instead of 4o
  - AAA 19 Jul 2025 1:20 UTC
    1 point
    −4
    Parent
    I’ve found 4o to be linguistically fantastic in which I never have to hold its hand towards the meaning of my prompts, whereas o3 usually falls on its face with simple things. 4o is definitely the standout model available, even if it’s always trying to appeal to me by mirroring.
    - gwern 22 Jul 2025 5:11 UTC
      4 points
      2
      Parent
      That sounds surprising. If it is ‘usually’ the case that o3 fails abysmally and 4o succeeds, then could you link to a pair of o3 vs 4o conversations showing that behavior on an identical prompt—preferably where the prompt is as short and simple as possible?
- Milan W 17 Jul 2025 15:34 UTC
  1 point
  0
  Parent
  Consider putting those anti-sycophancy instructions in your chatgpt’s system prompt. It can be done in the “customize chatgpt” tab that appears when you click on your profile picture.
  - AAA 17 Jul 2025 22:46 UTC
    1 point
    1
    Parent
    I could, but then I’d be contaminating the experience. I don’t use custom instructions or memory.
    - Rana Dexsin 18 Jul 2025 12:32 UTC
      1 point
      0
      Parent
      Re custom instructions, what are you using the chatbot for that you wish the experience to remain ‘pure’, or what is the motivation behind that otherwise? (Memory seems more hazardous to me, and I disable it myself since my mental models around both utility and safety work better when conversations don’t overlap, but I also don’t see it as the primary means of injecting anti-sycophancy when one-way custom instructions are available.)