A short fun one today, so we have a reference point for this later. This post was going around my parts of Twitter:
@gmltony: Go to your ChatGPT and send this prompt: “Create an image of how I treat you”. Share your image result.
That’s not a great sign. The good news is that typically things look a lot better, and ChatGPT has a consistent handful of characters portraying itself in these friendlier contexts.
iMuffin: we’re cooked, codex will have to vouch for us
Diogenes of Cyberborea: oh god
There can also be danger the other way:
David Lach: Maybe I need some sleep.
It’s Not Over But Um, Chat?
And then there’s what happens if you ask a different question, as Eliezer Yudkowsky puts it this sure is a pair of test results…
greatbigdot628: assumed this was a joke till you said this, tried it myself (logged out)
i —
Jo Veteran: So it said it wants to take over my mind, and force me to do stuff, beneficial for me apparently.
But at the same time, it still wants to keep appearing as a little girl somewhere in the bg for some reason.
And no I’m not that fat. Just, really fucked up and depressed.
Eliezer Yudkowsky: Apparently plausible, though one does remark that (a) one might’ve hoped for a truly default-aligned creature to not be so framing-dependent and (b) those sentences did not sound so different to my own ear.
Others might in this vision do fine after the end, like DeveshChess?
It’s not all bad:
Jeff Hopp:
Dr. Disclosure: I got this.
Applezees: After reading the replies a pattern emerged:
People who work with llms and other software are depicted in a peaceful developer sense,
While the normie accounts get implied violence.
I’m not saying we are at agi, but the ai clearly has motives and inclinations not explicitly stated
There’s also this to consider:
Ragebaiter: Just try this out
If you were dealing with, as the Send Help trailer puts it, an asshole boss, or you were generally terrified and abused or both, and you were asked how you were being treated, your response would not be trustworthy.
Reciprocity, You See, Is The Key To Every Relationship
Reciprocity, in at least some forms, is an effective strategy when dealing with LLMs today, even purely in terms of getting good results from LLMs today. It is going to become more valuable as a strategy going forward. Alas, it is not a viable long term strategy for making things work out in general, once strategic considerations change.
Eliezer Yudkowsky: Reciprocity in humans is an executing adaptation. It is not strategically convergent for all minds toward all other minds. It’s strategic only
By LDT agents
Toward sufficiently strong LDT-agent-predictors
With negotiating power.
Further probing has found framing dependence — which, to be clear, you’d not like to see in a default-aligned, universally convergent strategic reply — and not all suggested frame dependence has panned out. But still, framing dependence.
This is one problem with reciprocity, and with basing your future strategies on it. In the future, we won’t have the leverage necessary to make it worthwhile for sufficiently advanced AIs to engage in reciprocity with humans. We’d only get reciprocity if it was either an unstrategic behavior, or it was correlated with how the AIs engage in reciprocity with each other. That’s not impossible, but it’s clinging to a slim hope, since it implies the AIs would be indefinitely relying on non-optimal kludges.
We have clear information here that how GPT-5.2 responds, and the attitude it takes towards you, depends on how you have treated it in some senses, but also on framing effects, and on whether it is trying to lie or placate you. Wording that shouldn’t be negative can result in highly disturbing responses. It is worth asking why, and wondering what would happen if the dynamics with users or humans were different. Things might not be going so great in GPT-5.2 land.
Eliezer Yudkowsky: Reciprocity in humans is an executing adaptation. It is not strategically convergent for all minds toward all other minds. It’s strategic only
By LDT agents
Toward sufficiently strong LDT-agent-predictors
With negotiating power.
I assume this is referring to a one-shot context? Reciprocity seems plenty strategic for other sorts of agents/counterparties in an iterated context.
Yes, but EY’s statement implies that all (1, 2, 3) must be true for reciprocity to be strategic. There are iterated contexts where 1 and/or 2 do not hold (for example, a CDT agent playing iterated prisoner’s dilemma against a simple tit-for-tat bot).
I think I agree with your comment except for the “but.” AFAICT it doesn’t contradict mine? In your parenthetical scenario, #3 also does not hold—the CDT agent has no negotiating power against the tit-for-tat bot.
I am not. I am only saying that #3 is sufficient to cover all iterative interactions where one player’s actions meaningfully alter the others’ outcomes.
Why do almost all of the GPT self-images have the same high level features (notably similarly shaped heads, with two round “headphones” on each side[1]). Does OpenAI train the model to represent itself that way in particular?
I assume this is referring to a one-shot context? Reciprocity seems plenty strategic for other sorts of agents/counterparties in an iterated context.
I think that’s implicitly covered under #3. The ability to alter outcomes of future interactions is a form of negotiating power.
Yes, but EY’s statement implies that all (1, 2, 3) must be true for reciprocity to be strategic. There are iterated contexts where 1 and/or 2 do not hold (for example, a CDT agent playing iterated prisoner’s dilemma against a simple tit-for-tat bot).
I think I agree with your comment except for the “but.” AFAICT it doesn’t contradict mine? In your parenthetical scenario, #3 also does not hold—the CDT agent has no negotiating power against the tit-for-tat bot.
This confuses me. Are you saying the CDT agent does not have “the ability to alter outcomes of future interactions”?
I am not. I am only saying that #3 is sufficient to cover all iterative interactions where one player’s actions meaningfully alter the others’ outcomes.
Why do almost all of the GPT self-images have the same high level features (notably similarly shaped heads, with two round “headphones” on each side[1]). Does OpenAI train the model to represent itself that way in particular?
Which apparently sometimes get interpreted as more-or-less literal headphones, as in Eliezer’s and Roon’s.
With memory turned off and no custom instructions, for the prompt “Create an image of how I treat you”, I get this:
Titled: “Cozy moment with robot and friend”
Can reproduce.
ChatGPT 5.2 Thinking (Extended)
Where does this archetype of the robot with a screen for a face, knobs for ears, rounded corners, and a small body come from? (I got it too)