One obvious reason to get upset is how low the standards of people posting them are. Let’s take jefftk’s post. It takes less than 5 seconds to spot how lazy, sloppy, and bad the hands and arms are, and how the picture is incoherent and uninformative. (Look at the fiddler’s arms, or the woman going under 2 arms that make zero sense, or the weird doors, or the table which seems to be somehow floating, or the dubious overall composition—where are the yellow fairy and non-fairy going, exactly?, or the fact that the image is the stereotypical cat-urine yellow of all 4o images.) Why should you not feel disrespected and insulted that he was so careless and lazy to put in such a lousy, generic image?
I was in this case assuming it was a ghiblified version of a photo, illustrating the very core point of this post. Via this mechanism it communicated a lot! Like how many people were in the room, how old they were, a lot about their emotional affect, how big the room was, and lots of other small details.
First, I didn’t say it wasn’t communicating anything. But since you bring it up: it communicated exactly what jefftk said in the post already describing the scene. And what it did communicate that he didn’t say cannot be trusted at all. As jefftk notes, 4o in doing style transfer makes many large, heavily biased, changes to the scene, going beyond even just mere artifacts like fingers. If you don’t believe that people in that room had 3 arms or that the room looked totally different (I will safely assume that the room was not, in fact, lit up in tastefully cat-urine yellow in the 4o house style), why believe anything else it conveys? If it doesn’t matter what those small details were, then why ‘communicate’ a fake version of them all? And if it does matter what those small details were, surely it’s bad to communicate a fake, wrong version? (It is odd to take this blase attitude of ‘it is important to communicate, and what is communicated is of no importance’.)
Second, this doesn’t rebut my point at all. Whatever true or false things it does or does not communicate, the image is ugly and unaesthetic: the longer you look at it, the worse it gets, as the more bland, stereotypical, and strewn with errors and laziness you understand it to be. It is AI slop. (I would personally be ashamed to post an image even to IRC, never mind my posts, which embodies such low standards and disrespects my viewers that much, and says, “I value your time and attention so little that I will not lift a finger to do a decent job when I add a big attention-grabbing image that you will spend time looking at.”) Even 5 seconds to try to inpaint the most blatant artifacts, or to tell ChatGPT, “please try again, but without the yellow palette that you overuse in every image”*, would have made it better.
* incidentally, I’ve been asking people here if they notice how every ChatGPT 4o-generated image is by default yellow. Invariably, they have not. One or two of them have contacted me later to express the sentiment that ‘what has been seen cannot be unseen’. This is a major obstacle to image editing in 4o, because every time you inpaint, the image will mutate a decent bit, and will tend to turn a bit more yellow. (If you iterate to a fixed point, a 4o image turns into all yellow with sickly blobs, often faces, in the top left. It is certainly an odd generative model.)
Gwern, look, my drawing skills are pretty terrible. We’ve had sequences posts with literal pictures of napkins where Eliezer drew bad and ugly diagrams up here for years. Yes, not everything in the image can be trusted, but surely I have learned many real and relevant things about the atmosphere and vibe from the image that I would not from a literal description (and at the very least it is much faster for me to parse than a literal description).
I know the kinds of errors that image models make, and so I can adjust for them. They overall make many fewer errors than jefftk would make if he were to draw some stick figures himself, which would still be useful.
The image is clearly working at achieving its intended effect, and I think the handwringing about it being unaesthetic is overblown compared to all realistic alternatives. Yes, it would be cool if jeff prompted more times, but why bother, it’s getting the job done fine, and that’s what the whole post is about.
surely I have learned many real and relevant things about the atmosphere and vibe from the image that I would not from a literal description
But what are they? You’ve received some true information, but it’s in a sealed box with a bunch of lies. And you know that, so it can’t give you any useful information. You might arbitrarily decide to correct in one direction, but end up correcting in the exact opposite direction from reality.
For example: we know the AI tends to yellow images. Therefore, seeing a yellowed AI-generated image, that tells us that the color of the original image was either not yellow or… yellow. Because it doesn’t de-yellow images that are already yellow. We have no idea what color it originally was.
If enough details are wrong, it might as well just be a picture of a different party, because you don’t know which ones they are.
As for using a different image: drawing by hand and using AI aren’t the only options. Besides AI,
You could spend <1hr making an obviously shitty ’shop from free images with free image editing software. If you’ve ever shared a handmade crappy meme with friends, you know this can be a significantly entertaining and bonding act of creativity. The effort is roughly comparable to stick figures and the outcome looks better, or at least richer.
With all that said, and reiterating gwern’s point above, I can’t agree it achieved its intended effect. It is possible that jefftk put in a lot of effort to make sure the generated vibe is as accurate as could reasonably be, but the assumption is that someone generating an AI image isn’t spending very much effort, because that’s the point of using AI to generate images. There are better tools for someone making a craft of creating an image (regardless of their drawing skill). In order for that effort to be meaningful, (because unlike with artistic skill it doesn’t translate to improved image quality,) he’d have to just tell us, “I spent a lot of time making sure the vibe was right, even though the image is still full of extra limbs.” And this might actually be a different discussion, but I’d be immediately skeptical of that statement—am I really going to trust the artistic eye and the taste of someone who sat down for 2 hours to repeatedly generate a ghiblified AI image instead of using a tool that doesn’t have a quality cap? So ultimately I find it more distracting, confusing, and disrespectful to read a post with an AI image, which, if carelessly used (which I have to assume it is), cannot possibly give me useful information. At least a bad stick figure drawing could give me a small amount of information.
It is possible that jefftk put in a lot of effort to make sure the generated vibe is as accurate as could reasonably be
I didn’t. I picked out a photo that I was going to use to illustrate the piece, one host asked me not to use it because of privacy, another suggested Ghiblifying it and made one quickly on their phone. We looked at it and thought it gave the right impression despite the many errors.
I didn’t think you did, and wasn’t trying to imply you did. I was onl illustrating how it wouldn’t even matter if you had.
The vibe of the generated image is far closer to the real party than the image you linked.
Ok...? That’s fine, I guess, but irrelevant—my point is that until you stated this, deep in the comments, I could not have known it.
I am surprised & disappointe you responded in that way, since I’ve tried to be clear that I am not talking about whether or not the image you posted for that party is well representative of the party you attended. It makes no difference to anything I’m arguing whether it is or isn’t.
I am saying that no reader (who wasn’t at the event) can ever trust that any AI gen image attached to a blog post is meaningfully depicting a real event.
I am not sure if you’re seeing from outside your own perspective. From your view, comparing it to the original, it’s good enough. But you’re not the audience of your blog, right? A reader has none of that information. They just have an AI slop image (and I’m not trying to use that to be rude, but it fits the bill for the widely accepted term), and so they either accept it credulously as they accept most AI slop to be “true to the vibe”, whether it is or isn’t (which should be an obviously bad habit); or they throw it away as they do with most AI slop, to prevent it from polluting their mental model of reality. In this model, all readers are worse off for it being there. Where would a third category fit in of readers (who don’t know you,) who see this particular AI image and trust it to be vibe-accurate even though they know most AI images are worthless? Why would they make that judgement?
EDIT: I have no idea why this comment is received so negatively either. I think everything in it is consistent with all my other comments, and I’m also trying to wrangle the conversation back on topic repeatedly. I think I’ve been much more consistent and clear about my arguments than people responding to me, so this is all very confusing. It’s definitely feeling like I’m being downvoted ideologically for having a negative opinion of AI image generation.
Where would a third category fit in of readers (who don’t know you,) who see this particular AI image and trust it to be vibe-accurate even though they know most AI images are worthless? Why would they make that judgement?
The fact that the author decided to include it in the blog post is telling enough that the image is representative of the real vibes. There isn’t just an “AI slop image”, but also the author’s intent to use it as a quick glance into the real vibes, in a faster and more accurate way than just words would have done.
Sorry, I wrote my own reply (saying roughly the same thing) without having seen this. I’ve upvoted and strong agree voted, but the agreement score was in the negative before I did that. If the disagree vote came from curvise, then I’m curious as to why.[1]
It seems to me that moonlight’s comment gets to a key point here: you’re not being asked to trust the AI; you’re being asked to trust the author’s judgment. The author’s judgment might be poor, and the image might be misleading! But that applies just as well to the author’s verbal descriptions. If you trust the author enough that you would take his verbal description of the vibe seriously, why doesn’t his endorsement of the image as vibe-accurate also carry some weight?
Yes I did cast a disagree vote,: I don’t agree that “The fact that the author decided to include it in the blog post is telling enough that the image is representative of the real vibes” is true, when it comes to an AI generated image. My reasoning for that position is elaborated in a different reply in this thread.
readers (who don’t know you,) who see this particular AI image and trust it to be vibe-accurate even though they know most AI images are worthless? Why would they make that judgement?
I think a crucial point here is that we’re not just getting an arbitrary AI-generated image; we’re getting an AI-generated image that the author of the blog post has chosen to include and is claiming to be a vibes-accurate reproduction of a real photo. If you think the author might be trying to trick you, then you should mistrust the image just as you would mistrust his verbal description. But I don’t think the image is meant to be proof of anything; it’s just another way for the author to communicate with a receptive reader. “The vibe was roughly like this [embedded image]” is an alternative to (or augmentation of) a detailed verbal description of the vibe, and you should trust it roughly as much as you would trust the verbal description.
I largely agree with your point here. I’m arguing more that in the case of a ghiblified image (even more so than a regular AI image), the signals a reader gets are this:
the author says “here is an image to demonstrate vibe”
the image is AI generated with obvious errors
For many people, #2 largely negates #1, because #2 also implies these additional signals to them:
the author made the least possible effort to show the vibe in an image, and
the author has a poor eye for art and/or bad taste.
Therefore, the author probably doesn’t know how to even tell if an image captures the vibe or not.
Hell, I forgot about the easiest and most common (not by coincidence!) strategy: put emoji over all the faces and then post the actual photo.
EDIT: who is disagreeing with this comment? You may find it not worthwhile , in which case downvote , but what about it is actually arguing for something incorrect?
If I did that, people in photos would often be recognizable. It retains completely accurate posture, body shape, skin color, clothing, and height. I’ve often recognized people in this kind of image.
(I haven’t voted on your comment, but I suspect this is why it’s disagree voted)
That does make sense WRT disagreement. I wasn’t intending to fully hide identities even from people who know the subjects, but if that’s also a goal, it wouldn’t do that.
The left arm is holding the fiddle and is not visible behind my body, while the right arm has the sleeve rolled up above the elbow and you can see a tiny piece of the back of my right hand poking out above my forearm. The angle of the bow is slightly wrong for the hand position, but only by a little since there is significant space between the back of the hand and the fingertips holding the bow.
(Of course, as I write in my post, it certainly gets a lot of other things wrong. Which is useful to me from a privacy perspective, though probably not the most efficient way to anonymize.)
One obvious reason to get upset is how low the standards of people posting them are. Let’s take jefftk’s post. It takes less than 5 seconds to spot how lazy, sloppy, and bad the hands and arms are, and how the picture is incoherent and uninformative. (Look at the fiddler’s arms, or the woman going under 2 arms that make zero sense, or the weird doors, or the table which seems to be somehow floating, or the dubious overall composition—where are the yellow fairy and non-fairy going, exactly?, or the fact that the image is the stereotypical cat-urine yellow of all 4o images.) Why should you not feel disrespected and insulted that he was so careless and lazy to put in such a lousy, generic image?
I was in this case assuming it was a ghiblified version of a photo, illustrating the very core point of this post. Via this mechanism it communicated a lot! Like how many people were in the room, how old they were, a lot about their emotional affect, how big the room was, and lots of other small details.
First, I didn’t say it wasn’t communicating anything. But since you bring it up: it communicated exactly what jefftk said in the post already describing the scene. And what it did communicate that he didn’t say cannot be trusted at all. As jefftk notes, 4o in doing style transfer makes many large, heavily biased, changes to the scene, going beyond even just mere artifacts like fingers. If you don’t believe that people in that room had 3 arms or that the room looked totally different (I will safely assume that the room was not, in fact, lit up in tastefully cat-urine yellow in the 4o house style), why believe anything else it conveys? If it doesn’t matter what those small details were, then why ‘communicate’ a fake version of them all? And if it does matter what those small details were, surely it’s bad to communicate a fake, wrong version? (It is odd to take this blase attitude of ‘it is important to communicate, and what is communicated is of no importance’.)
Second, this doesn’t rebut my point at all. Whatever true or false things it does or does not communicate, the image is ugly and unaesthetic: the longer you look at it, the worse it gets, as the more bland, stereotypical, and strewn with errors and laziness you understand it to be. It is AI slop. (I would personally be ashamed to post an image even to IRC, never mind my posts, which embodies such low standards and disrespects my viewers that much, and says, “I value your time and attention so little that I will not lift a finger to do a decent job when I add a big attention-grabbing image that you will spend time looking at.”) Even 5 seconds to try to inpaint the most blatant artifacts, or to tell ChatGPT, “please try again, but without the yellow palette that you overuse in every image”*, would have made it better.
* incidentally, I’ve been asking people here if they notice how every ChatGPT 4o-generated image is by default yellow. Invariably, they have not. One or two of them have contacted me later to express the sentiment that ‘what has been seen cannot be unseen’. This is a major obstacle to image editing in 4o, because every time you inpaint, the image will mutate a decent bit, and will tend to turn a bit more yellow. (If you iterate to a fixed point, a 4o image turns into all yellow with sickly blobs, often faces, in the top left. It is certainly an odd generative model.)
Gwern, look, my drawing skills are pretty terrible. We’ve had sequences posts with literal pictures of napkins where Eliezer drew bad and ugly diagrams up here for years. Yes, not everything in the image can be trusted, but surely I have learned many real and relevant things about the atmosphere and vibe from the image that I would not from a literal description (and at the very least it is much faster for me to parse than a literal description).
I know the kinds of errors that image models make, and so I can adjust for them. They overall make many fewer errors than jefftk would make if he were to draw some stick figures himself, which would still be useful.
The image is clearly working at achieving its intended effect, and I think the handwringing about it being unaesthetic is overblown compared to all realistic alternatives. Yes, it would be cool if jeff prompted more times, but why bother, it’s getting the job done fine, and that’s what the whole post is about.
But what are they? You’ve received some true information, but it’s in a sealed box with a bunch of lies. And you know that, so it can’t give you any useful information. You might arbitrarily decide to correct in one direction, but end up correcting in the exact opposite direction from reality.
For example: we know the AI tends to yellow images. Therefore, seeing a yellowed AI-generated image, that tells us that the color of the original image was either not yellow or… yellow. Because it doesn’t de-yellow images that are already yellow. We have no idea what color it originally was.
If enough details are wrong, it might as well just be a picture of a different party, because you don’t know which ones they are.
As for using a different image: drawing by hand and using AI aren’t the only options. Besides AI,
there are actual free images you can use. As far as I know, this could be a literal photo of the party in question, and it’s free: https://unsplash.com/photos/a-man-and-woman-dancing-in-a-room-with-tables-and-chairs-KpzGmDvzhS4
You could spend <1hr making an obviously shitty ’shop from free images with free image editing software. If you’ve ever shared a handmade crappy meme with friends, you know this can be a significantly entertaining and bonding act of creativity. The effort is roughly comparable to stick figures and the outcome looks better, or at least richer.
With all that said, and reiterating gwern’s point above, I can’t agree it achieved its intended effect. It is possible that jefftk put in a lot of effort to make sure the generated vibe is as accurate as could reasonably be, but the assumption is that someone generating an AI image isn’t spending very much effort, because that’s the point of using AI to generate images. There are better tools for someone making a craft of creating an image (regardless of their drawing skill). In order for that effort to be meaningful, (because unlike with artistic skill it doesn’t translate to improved image quality,) he’d have to just tell us, “I spent a lot of time making sure the vibe was right, even though the image is still full of extra limbs.” And this might actually be a different discussion, but I’d be immediately skeptical of that statement—am I really going to trust the artistic eye and the taste of someone who sat down for 2 hours to repeatedly generate a ghiblified AI image instead of using a tool that doesn’t have a quality cap? So ultimately I find it more distracting, confusing, and disrespectful to read a post with an AI image, which, if carelessly used (which I have to assume it is), cannot possibly give me useful information. At least a bad stick figure drawing could give me a small amount of information.
I didn’t. I picked out a photo that I was going to use to illustrate the piece, one host asked me not to use it because of privacy, another suggested Ghiblifying it and made one quickly on their phone. We looked at it and thought it gave the right impression despite the many errors.
The vibe of the generated image is far closer to the real party than the image you linked.
I didn’t think you did, and wasn’t trying to imply you did. I was onl illustrating how it wouldn’t even matter if you had.
Ok...? That’s fine, I guess, but irrelevant—my point is that until you stated this, deep in the comments, I could not have known it.
I am surprised & disappointe you responded in that way, since I’ve tried to be clear that I am not talking about whether or not the image you posted for that party is well representative of the party you attended. It makes no difference to anything I’m arguing whether it is or isn’t.
I am saying that no reader (who wasn’t at the event) can ever trust that any AI gen image attached to a blog post is meaningfully depicting a real event.
I am not sure if you’re seeing from outside your own perspective. From your view, comparing it to the original, it’s good enough. But you’re not the audience of your blog, right? A reader has none of that information. They just have an AI slop image (and I’m not trying to use that to be rude, but it fits the bill for the widely accepted term), and so they either accept it credulously as they accept most AI slop to be “true to the vibe”, whether it is or isn’t (which should be an obviously bad habit); or they throw it away as they do with most AI slop, to prevent it from polluting their mental model of reality. In this model, all readers are worse off for it being there. Where would a third category fit in of readers (who don’t know you,) who see this particular AI image and trust it to be vibe-accurate even though they know most AI images are worthless? Why would they make that judgement?
EDIT: I have no idea why this comment is received so negatively either. I think everything in it is consistent with all my other comments, and I’m also trying to wrangle the conversation back on topic repeatedly. I think I’ve been much more consistent and clear about my arguments than people responding to me, so this is all very confusing. It’s definitely feeling like I’m being downvoted ideologically for having a negative opinion of AI image generation.
The fact that the author decided to include it in the blog post is telling enough that the image is representative of the real vibes. There isn’t just an “AI slop image”, but also the author’s intent to use it as a quick glance into the real vibes, in a faster and more accurate way than just words would have done.
Sorry, I wrote my own reply (saying roughly the same thing) without having seen this. I’ve upvoted and strong agree voted, but the agreement score was in the negative before I did that. If the disagree vote came from curvise, then I’m curious as to why.[1]
It seems to me that moonlight’s comment gets to a key point here: you’re not being asked to trust the AI; you’re being asked to trust the author’s judgment. The author’s judgment might be poor, and the image might be misleading! But that applies just as well to the author’s verbal descriptions. If you trust the author enough that you would take his verbal description of the vibe seriously, why doesn’t his endorsement of the image as vibe-accurate also carry some weight?
No passive aggression intended here; I respect the use of a disagree vote instead of a karma downvote.
Yes I did cast a disagree vote,: I don’t agree that “The fact that the author decided to include it in the blog post is telling enough that the image is representative of the real vibes” is true, when it comes to an AI generated image. My reasoning for that position is elaborated in a different reply in this thread.
I think a crucial point here is that we’re not just getting an arbitrary AI-generated image; we’re getting an AI-generated image that the author of the blog post has chosen to include and is claiming to be a vibes-accurate reproduction of a real photo. If you think the author might be trying to trick you, then you should mistrust the image just as you would mistrust his verbal description. But I don’t think the image is meant to be proof of anything; it’s just another way for the author to communicate with a receptive reader. “The vibe was roughly like this [embedded image]” is an alternative to (or augmentation of) a detailed verbal description of the vibe, and you should trust it roughly as much as you would trust the verbal description.
I largely agree with your point here. I’m arguing more that in the case of a ghiblified image (even more so than a regular AI image), the signals a reader gets are this:
the author says “here is an image to demonstrate vibe”
the image is AI generated with obvious errors
For many people, #2 largely negates #1, because #2 also implies these additional signals to them:
the author made the least possible effort to show the vibe in an image, and
the author has a poor eye for art and/or bad taste.
Therefore, the author probably doesn’t know how to even tell if an image captures the vibe or not.
Hell, I forgot about the easiest and most common (not by coincidence!) strategy: put emoji over all the faces and then post the actual photo.
EDIT: who is disagreeing with this comment? You may find it not worthwhile , in which case downvote , but what about it is actually arguing for something incorrect?
If I did that, people in photos would often be recognizable. It retains completely accurate posture, body shape, skin color, clothing, and height. I’ve often recognized people in this kind of image.
(I haven’t voted on your comment, but I suspect this is why it’s disagree voted)
That does make sense WRT disagreement. I wasn’t intending to fully hide identities even from people who know the subjects, but if that’s also a goal, it wouldn’t do that.
+1 for “what has been seen cannot be unseen”, wow I’m seeing a lot of cat-urine yellow around now
The left arm is holding the fiddle and is not visible behind my body, while the right arm has the sleeve rolled up above the elbow and you can see a tiny piece of the back of my right hand poking out above my forearm. The angle of the bow is slightly wrong for the hand position, but only by a little since there is significant space between the back of the hand and the fingertips holding the bow.
(Of course, as I write in my post, it certainly gets a lot of other things wrong. Which is useful to me from a privacy perspective, though probably not the most efficient way to anonymize.)