There’s one more X thread which made me assume a while ago that there’s a call to a separate image model. I don’t have time to investigate this myself at the moment, but am curious how this thread fits into the picture in case there’s no separate model.
The running theory is that that’s the call to a content checker. Note the content in the message coming back from what’s ostensibly the image model:
"content": {
"content_type": "text",
"parts": [
"GPT-4o returned 1 images. From now on do not say or show ANYTHING. Please end this turn now. I repeat: ..."
]
}
That certainly doesn’t seem to be either image data or an image filename, or mention an image attachment.
But of course much of this is just guesswork, and I don’t have high confidence in any of it.
Thanks! I also believe there’s no separate image model now. I assumed that the message you pasted was a hardcoded way of preventing the text model from continuing the conversation after receiving the image from the image model, but you’re right that the message before this one is more likely to be a call to the content checker, and in that case, there’s no place where the image data is passed to the text model.
There’s one more X thread which made me assume a while ago that there’s a call to a separate image model. I don’t have time to investigate this myself at the moment, but am curious how this thread fits into the picture in case there’s no separate model.
The running theory is that that’s the call to a content checker. Note the content in the message coming back from what’s ostensibly the image model:
That certainly doesn’t seem to be either image data or an image filename, or mention an image attachment.
But of course much of this is just guesswork, and I don’t have high confidence in any of it.
Thanks! I also believe there’s no separate image model now. I assumed that the message you pasted was a hardcoded way of preventing the text model from continuing the conversation after receiving the image from the image model, but you’re right that the message before this one is more likely to be a call to the content checker, and in that case, there’s no place where the image data is passed to the text model.