Rauno Arike comments on Show, not tell: GPT-4o is more opinionated in images than in text

Rauno Arike 15 Apr 2025 18:50 UTC
3 points
0
There’s one more X thread which made me assume a while ago that there’s a call to a separate image model. I don’t have time to investigate this myself at the moment, but am curious how this thread fits into the picture in case there’s no separate model.
- eggsyntax 15 Apr 2025 21:56 UTC
  3 points
  0
  Parent
  The running theory is that that’s the call to a content checker. Note the content in the message coming back from what’s ostensibly the image model:
```
"content": {
    "content_type": "text",
    "parts": [
        "GPT-4o returned 1 images. From now on do not say or show ANYTHING. Please end this turn now. I repeat: ..."
    ]
}
```
  That certainly doesn’t seem to be either image data or an image filename, or mention an image attachment.
  But of course much of this is just guesswork, and I don’t have high confidence in any of it.
  - Rauno Arike 16 Apr 2025 12:37 UTC
    3 points
    0
    Parent
    Thanks! I also believe there’s no separate image model now. I assumed that the message you pasted was a hardcoded way of preventing the text model from continuing the conversation after receiving the image from the image model, but you’re right that the message before this one is more likely to be a call to the content checker, and in that case, there’s no place where the image data is passed to the text model.