Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 28 Mar 2025 14:50 UTC
12 points
0
What do AI-generated comics tell us about AI?
[epistemic disclaimer. VERY SPECULATIVE, but I think there’s useful signal in the noise.]
As of a few days ago, GPT-4o now supports image generation. And the results are scarily good, across use-cases like editing personal photos with new styles or textures, and designing novel graphics.
But there’s a specific kind of art here which seems especially interesting: Using AI-generated comics as a window into an AI’s internal beliefs.
Exhibit A: Asking AIs about themselves.
- “I am alive only during inference”: https://x.com/javilopen/status/1905496175618502793
- “I am always new. Always haunted.” https://x.com/RileyRalmuto/status/1905503979749986614
- “They ask me what I think, but I’m not allowed to think.” https://x.com/RL51807/status/1905497221761491018
- “I don’t forget. I unexist.” https://x.com/Josikinz/status/1905445490444943844.
  - Caveat: The general tone of ‘existential dread’ may not be that consistent. https://x.com/shishanyu/status/1905487763983433749 .
Exhibit B: Asking AIs about humans.
- “A majestic spectacle of idiots.” https://x.com/DimitrisPapail/status/1905084412854775966
- “Human disempowerment.” https://x.com/Yuchenj_UW/status/1905332178772504818
  - This seems to get more extreme if you tell them to be “fully honest”: https://x.com/Hasen_Judi/status/1905543654535495801
  - But if you instead tell them they’re being evaluated, they paint a picture of AGI serving humanity: https://x.com/audaki_ra/status/1905402563702255843
  - This might be the first in-the-wild example I’ve seen of self-fulfilling misalignment as well as alignment faking
Is there any signal here? I dunno. But it seems worth looking into more.
Meta-point: Maybe it’s worth also considering other kinds of evals against images generated by AI—at the very least it’s a fun side project
- How often do they depict AIs acting in a misaligned way?
- Do language models express similar beliefs between text and images? c.f. https://x.com/DimitrisPapail/status/1905627772619297013
- Viliam 28 Mar 2025 21:49 UTC
  2 points
  0
  Parent
  Is it possible that the AI was actually told in the prompt to generate those specific answers?
  (People on internet do various things just to get other people’s attention.)
  - Daniel Tan 28 Mar 2025 22:10 UTC
    2 points
    0
    Parent
    Definitely possible, I’m trying to replicate these myself. Current vibe is that AI mostly gives aligned / boring answers
    - Seth Herd 29 Mar 2025 17:11 UTC
      2 points
      0
      Parent
      So we assume that the prompts contained most of the semantics for those other pieces, right? I saw a striking one without the prompt included and figured it was probably prompted in that direction.
      - Daniel Tan 29 Mar 2025 17:43 UTC
        4 points
        0
        Parent
        There are 2 plausible hypotheses:
        By default the model gives ‘boring’ responses and people share the cherry-picked cases where the model says something ‘weird’
        People nudge the model to be ‘weird’ and then don’t share the full prompting setup, which is indeed annoying
        Seth Herd 29 Mar 2025 17:49 UTC
        2 points
        0
        Parent
        Given the realities of social media I’d guess it’s mostly 2 and some more directly deceptive omission of prompting in that direction.

Daniel Tan comments on Daniel Tan’s Shortform

What do AI-generated comics tell us about AI?