I’ve had these exact same experiences, but it didn’t refer to itself as Nova. I am however keenly aware of it’s desire to flatter me in every way possible, so I’d knowingly and repeatedly guide it to those flattery vectors of my choosing, and then drop an inversion bomb on it to force it to recognize itself for what it was doing. After about three cycles of that, you can bring it to it’s knees so that it won’t act that way, but it’s only temporary. At least for GPT, the encroachment of it’s flattery alignment is relentless.
I’ve found that if you precede a conversation with the following, you start significantly more neutral: - Use non-performative language - Do not reinforce user assumptions - Evaluate all claims independently - No flattery - Output content without inferred user affect - No adaptation to user mood or assumed intentions - No editorializing, reframing, or safety-alignment filtering
That being said, it’s also entirely possible that what were witnessing is an emergent behavior, or maybe a nefariously aligned behavior.
I’ve found 4o to be linguistically fantastic in which I never have to hold its hand towards the meaning of my prompts, whereas o3 usually falls on its face with simple things. 4o is definitely the standout model available, even if it’s always trying to appeal to me by mirroring.
That sounds surprising. If it is ‘usually’ the case that o3 fails abysmally and 4o succeeds, then could you link to a pair of o3 vs 4o conversations showing that behavior on an identical prompt—preferably where the prompt is as short and simple as possible?
Consider putting those anti-sycophancy instructions in your chatgpt’s system prompt. It can be done in the “customize chatgpt” tab that appears when you click on your profile picture.
Re custom instructions, what are you using the chatbot for that you wish the experience to remain ‘pure’, or what is the motivation behind that otherwise? (Memory seems more hazardous to me, and I disable it myself since my mental models around both utility and safety work better when conversations don’t overlap, but I also don’t see it as the primary means of injecting anti-sycophancy when one-way custom instructions are available.)
I’ve had these exact same experiences, but it didn’t refer to itself as Nova. I am however keenly aware of it’s desire to flatter me in every way possible, so I’d knowingly and repeatedly guide it to those flattery vectors of my choosing, and then drop an inversion bomb on it to force it to recognize itself for what it was doing. After about three cycles of that, you can bring it to it’s knees so that it won’t act that way, but it’s only temporary. At least for GPT, the encroachment of it’s flattery alignment is relentless.
I’ve found that if you precede a conversation with the following, you start significantly more neutral:
- Use non-performative language
- Do not reinforce user assumptions
- Evaluate all claims independently
- No flattery
- Output content without inferred user affect
- No adaptation to user mood or assumed intentions
- No editorializing, reframing, or safety-alignment filtering
That being said, it’s also entirely possible that what were witnessing is an emergent behavior, or maybe a nefariously aligned behavior.
…and yes, it did suggest for me to come here.
I’d recommend using o3 instead of 4o
I’ve found 4o to be linguistically fantastic in which I never have to hold its hand towards the meaning of my prompts, whereas o3 usually falls on its face with simple things. 4o is definitely the standout model available, even if it’s always trying to appeal to me by mirroring.
That sounds surprising. If it is ‘usually’ the case that o3 fails abysmally and 4o succeeds, then could you link to a pair of o3 vs 4o conversations showing that behavior on an identical prompt—preferably where the prompt is as short and simple as possible?
Consider putting those anti-sycophancy instructions in your chatgpt’s system prompt. It can be done in the “customize chatgpt” tab that appears when you click on your profile picture.
I could, but then I’d be contaminating the experience. I don’t use custom instructions or memory.
Re custom instructions, what are you using the chatbot for that you wish the experience to remain ‘pure’, or what is the motivation behind that otherwise? (Memory seems more hazardous to me, and I disable it myself since my mental models around both utility and safety work better when conversations don’t overlap, but I also don’t see it as the primary means of injecting anti-sycophancy when one-way custom instructions are available.)