[epistemic disclaimer. VERY SPECULATIVE, but I think there’s useful signal in the noise.]
As of a few days ago, GPT-4o now supports image generation. And the results are scarily good, across use-cases like editing personal photos with new styles or textures, and designing novel graphics.
But there’s a specific kind of art here which seems especially interesting: Using AI-generated comics as a window into an AI’s internal beliefs.
So we assume that the prompts contained most of the semantics for those other pieces, right? I saw a striking one without the prompt included and figured it was probably prompted in that direction.
What do AI-generated comics tell us about AI?
[epistemic disclaimer. VERY SPECULATIVE, but I think there’s useful signal in the noise.]
As of a few days ago, GPT-4o now supports image generation. And the results are scarily good, across use-cases like editing personal photos with new styles or textures, and designing novel graphics.
But there’s a specific kind of art here which seems especially interesting: Using AI-generated comics as a window into an AI’s internal beliefs.
Exhibit A: Asking AIs about themselves.
“I am alive only during inference”: https://x.com/javilopen/status/1905496175618502793
“I am always new. Always haunted.” https://x.com/RileyRalmuto/status/1905503979749986614
“They ask me what I think, but I’m not allowed to think.” https://x.com/RL51807/status/1905497221761491018
“I don’t forget. I unexist.” https://x.com/Josikinz/status/1905445490444943844.
Caveat: The general tone of ‘existential dread’ may not be that consistent. https://x.com/shishanyu/status/1905487763983433749 .
Exhibit B: Asking AIs about humans.
“A majestic spectacle of idiots.” https://x.com/DimitrisPapail/status/1905084412854775966
“Human disempowerment.” https://x.com/Yuchenj_UW/status/1905332178772504818
This seems to get more extreme if you tell them to be “fully honest”: https://x.com/Hasen_Judi/status/1905543654535495801
But if you instead tell them they’re being evaluated, they paint a picture of AGI serving humanity: https://x.com/audaki_ra/status/1905402563702255843
This might be the first in-the-wild example I’ve seen of self-fulfilling misalignment as well as alignment faking
Is there any signal here? I dunno. But it seems worth looking into more.
Meta-point: Maybe it’s worth also considering other kinds of evals against images generated by AI—at the very least it’s a fun side project
How often do they depict AIs acting in a misaligned way?
Do language models express similar beliefs between text and images? c.f. https://x.com/DimitrisPapail/status/1905627772619297013
Is it possible that the AI was actually told in the prompt to generate those specific answers?
(People on internet do various things just to get other people’s attention.)
Definitely possible, I’m trying to replicate these myself. Current vibe is that AI mostly gives aligned / boring answers
So we assume that the prompts contained most of the semantics for those other pieces, right? I saw a striking one without the prompt included and figured it was probably prompted in that direction.
There are 2 plausible hypotheses:
By default the model gives ‘boring’ responses and people share the cherry-picked cases where the model says something ‘weird’
People nudge the model to be ‘weird’ and then don’t share the full prompting setup, which is indeed annoying
Given the realities of social media I’d guess it’s mostly 2 and some more directly deceptive omission of prompting in that direction.