In analogy, someone who wanted to argue that it’s infeasible to build text-to-image generative models that make art that humans enjoy, could partition their prediction of failure into disjunctive failure modes: the model has to generalize what hands look like; it has to generalize what birds look like. It has to generalize which compositions and color combinations are pleasing—which is arguably an “ought”/steering problem, not an “is”/prediction problem! One-shotting all of those separate problems isn’t something that human beings can do in real life, the argument would go. But of course, the problems aren’t independent, and text-to-image generators do exist.
Isn’t part of the deal here that we didn’t one-shot image generation, though?
The first image generators were crazy, we slowly iterated on them, and image generation is “easy” because unlike superintelligence or even self-driving cars or regular ol’ production code, nothing particularly bad happens if a given image is bad.
That said, FYI I was kind of enlightened by this phrasing:
That is, in the multiple stage fallacy, someone who wishes to portray a proposition as unlikely can prey on people’s reluctance to assign extreme probabilities by spuriously representing the proposition as a conjunction of sub-propositions that all need to be true.
I’d been feeling sus about why the multiple stage fallacy was even a fallacy at all, apart from “somehow in practice people fuck it up.” Multiplying probabilities together is… like, how else are you supposed to do any kind of sophisticated reasoning?
But, “because people are scared of (or bad at) assigning extreme probabilities” feels like it explains it to me.
I think the larger effect is treating the probabilities as independent when they’re not.
Suppose I have a jar of jelly beans, which are either all red, all green or all blue. You want to know what the probability of drawing 100 blue jelly beans is. Is it 13100≈2⋅10−48? No, of course not. That’s what you get if you multiply 1⁄3 by itself 100 times. But you should condition on your results as you go. P(jelly1 = blue)⋅P(jelly2=blue|jelly1=blue)⋅P(jelly3=blue|jelly1=blue,jelly2=blue) …
Every factor but the first is 1, so the probability is 13.
Isn’t part of the deal here that we didn’t one-shot image generation, though?
The first image generators were crazy, we slowly iterated on them, and image generation is “easy” because unlike superintelligence or even self-driving cars or regular ol’ production code, nothing particularly bad happens if a given image is bad.
That said, FYI I was kind of enlightened by this phrasing:
I’d been feeling sus about why the multiple stage fallacy was even a fallacy at all, apart from “somehow in practice people fuck it up.” Multiplying probabilities together is… like, how else are you supposed to do any kind of sophisticated reasoning?
But, “because people are scared of (or bad at) assigning extreme probabilities” feels like it explains it to me.
I think the larger effect is treating the probabilities as independent when they’re not.
Suppose I have a jar of jelly beans, which are either all red, all green or all blue. You want to know what the probability of drawing 100 blue jelly beans is. Is it 13100≈2⋅10−48? No, of course not. That’s what you get if you multiply 1⁄3 by itself 100 times. But you should condition on your results as you go. P(jelly1 = blue)⋅P(jelly2=blue|jelly1=blue)⋅P(jelly3=blue|jelly1=blue,jelly2=blue) …
Every factor but the first is 1, so the probability is 13.