Do the people disagreeing with your other post really want to buy the whole analogy, but disagree with the evaluation wrt air conditioners?? I would have guessed they merely selected an easy-looking nitpick to start with?
Or on the other hand, do they really buy the whole argument that this is a good way to test what kinds of problems wrt AGI we can expect the market to address, not merely an illustration of said argument, and therefore that the best way to settle the dispute is to specifically debate the point wrt air conditioners rather than, say, try for a representative market sample of potential problems fitting various patterns, and whether those problems naturally get solved by the market or not?
… Or am I reading too much into this and people just want to talk about AC?
I viewed this as nitpicking a claim that’s not super central. It felt indicative of a general pattern amongst rationalists of overconfident/overstated claims about civilizational inadequacy. I think often there are real problems in these cases, but they are kind of messy and typically not as big a deal as claimed.
I think it only has relevance to my views about AI alignment insofar as civilizational inadequacy is also relevant there. I don’t think the detailed claims about AC have much relevance to the general story about civilizational inadequacy but I agree they have some. I don’t think the prediction in this post has that much relevance to whether the OP was overstated but I agree it has some.
In my original comment I made a prediction for the OP that amounted to predicting a 33-43% difference for typical use. John is predicting a >50% difference under some particular conditions that look likely to be pretty similar to that. The reader can decide how significant that is.
I do think there’s something sort of like a silent evidence problem for civilizational inadequacy. Something resembling green rationalists. There’s a natural tendency for claims of inadequacy to offend someone, because there’s a claim that someone is doing something wrong. As a result, there’s a natural tendency for evidence and arguments for inadequacy to soften as they get passed along the social web. A tendency to preferentially fill in excuses rather than condemnations.
So I have a tendency to pay more attention to the pro-inadequacy pieces of evidence that make it to me, because I think they’re probably more like what the real world looks like under the hood, and somewhat ignore arguments to the effect that they’re not as big a failure as they first appear.
I think there’s a real disagreement between worldviews upstream of both air conditioners and alignment, and the AC thing is a real test between those worldviews. It wasn’t chosen to be a particularly optimal test, but finding good clean disagreements between worldviews does take some work and this opportunity just kind of dropped in front of us on a silver platter.
I’ll be performing a (modest) update on the results of this experiment, and I strongly endorse John’s comment here as an explanation of why—it’s testing a worldview that’s upstream of both this AC debate and alignment.
In my case, the worldview being tested isn’t about civilizational inadequacy. Rather, it’s about how likely optimizers (e.g. the market, an AI system) are to do things that seem to satisfy our preferences (but actually have hidden bad side effects that we’re not smart enough to notice) vs. do things that actually satisfy our preferences. In other words, I’m interested in the question of whether optimizers will inevitably learn to Goodhart their objective function, including in cases of rich objective functions like “consumer satisfaction.”
I also strongly agree with John’s framing as this being just one bit of evidence, and not enough evidence to be a full crux. Really drilling down into this point would look more like selecting lots of top-rated goods and services and trying hard to figure out how many of them cause significant side-effects that don’t seem to be priced into consumers’ opinions.
You mean the AC thing? If I’m wrong, it wouldn’t be enough bits to flip all the relevant parts of my alignment views, but it would be enough bits that I’d be a lot less certain and invest more in finding other ways to gain bits.
(Though obviously that depends somewhat on how I turn out to be wrong.)
I feel like I’d think less of John if it weren’t a crux for him? Like, one of the troubles with worldviews like this is they lean both on your theories and on your evidence, and so you really need to grab at the examples that do shine thru of “oh, my worldview found this belief of mine very confirming, but people disagree with it; we should figure out whether or not I’m right.”
I think it makes sense to have a loose probabilistic relationship. I do not think it makes sense for it to be a crux, in the sense of a thing which, if false, would make John abandon his view. There are just too many weak steps. The AI industry is not the AC industry. I happen to agree with John’s views about AC, but it’s not obvious to me that those views imply this particular test turning out as he’s predicting. (Is he averaging over the wrong points?) It’s more probable than not, but my point here is that the whole thing is made of fairly weak inferences.
To be clear, I am pro what John is doing and how he is engaging; it’s more John’s commentors who felt confusing to me.
I will personally be updating my priors depending on the results of this test. If it turns out that the AC is actually bad at its job, I will very slightly update towards being pessimistic about us catching failure modes of AGI before it’s too late. If, however, it turns out that it does not make a substantial difference, I will somewhat more strongly (though not very strongly) update towards being more concerned about us missing these sorts of things. One question I’m not sure how to answer is how (if at all) I should update based on the seemingly obvious cherry-picked example not being obvious at all.
And the counterargument is:
No, 2 is not an example of a system failure
Therefore, I am not updating my prior for 1. because no new evidence has been presented
Do the people disagreeing with your other post really want to buy the whole analogy, but disagree with the evaluation wrt air conditioners?? I would have guessed they merely selected an easy-looking nitpick to start with?
Or on the other hand, do they really buy the whole argument that this is a good way to test what kinds of problems wrt AGI we can expect the market to address, not merely an illustration of said argument, and therefore that the best way to settle the dispute is to specifically debate the point wrt air conditioners rather than, say, try for a representative market sample of potential problems fitting various patterns, and whether those problems naturally get solved by the market or not?
… Or am I reading too much into this and people just want to talk about AC?
I viewed this as nitpicking a claim that’s not super central. It felt indicative of a general pattern amongst rationalists of overconfident/overstated claims about civilizational inadequacy. I think often there are real problems in these cases, but they are kind of messy and typically not as big a deal as claimed.
I think it only has relevance to my views about AI alignment insofar as civilizational inadequacy is also relevant there. I don’t think the detailed claims about AC have much relevance to the general story about civilizational inadequacy but I agree they have some. I don’t think the prediction in this post has that much relevance to whether the OP was overstated but I agree it has some.
In my original comment I made a prediction for the OP that amounted to predicting a 33-43% difference for typical use. John is predicting a >50% difference under some particular conditions that look likely to be pretty similar to that. The reader can decide how significant that is.
Makes sense.
I do think there’s something sort of like a silent evidence problem for civilizational inadequacy. Something resembling green rationalists. There’s a natural tendency for claims of inadequacy to offend someone, because there’s a claim that someone is doing something wrong. As a result, there’s a natural tendency for evidence and arguments for inadequacy to soften as they get passed along the social web. A tendency to preferentially fill in excuses rather than condemnations.
Self-consciousness wants to make everything about itself. It’s like the parable of the gullible king.
So I have a tendency to pay more attention to the pro-inadequacy pieces of evidence that make it to me, because I think they’re probably more like what the real world looks like under the hood, and somewhat ignore arguments to the effect that they’re not as big a failure as they first appear.
But such reasoning should be employed cautiously.
I think there’s a real disagreement between worldviews upstream of both air conditioners and alignment, and the AC thing is a real test between those worldviews. It wasn’t chosen to be a particularly optimal test, but finding good clean disagreements between worldviews does take some work and this opportunity just kind of dropped in front of us on a silver platter.
I’ll be performing a (modest) update on the results of this experiment, and I strongly endorse John’s comment here as an explanation of why—it’s testing a worldview that’s upstream of both this AC debate and alignment.
In my case, the worldview being tested isn’t about civilizational inadequacy. Rather, it’s about how likely optimizers (e.g. the market, an AI system) are to do things that seem to satisfy our preferences (but actually have hidden bad side effects that we’re not smart enough to notice) vs. do things that actually satisfy our preferences. In other words, I’m interested in the question of whether optimizers will inevitably learn to Goodhart their objective function, including in cases of rich objective functions like “consumer satisfaction.”
I also strongly agree with John’s framing as this being just one bit of evidence, and not enough evidence to be a full crux. Really drilling down into this point would look more like selecting lots of top-rated goods and services and trying hard to figure out how many of them cause significant side-effects that don’t seem to be priced into consumers’ opinions.
Is it a crux for you? It feels to me like it shouldn’t be a crux for you.
You mean the AC thing? If I’m wrong, it wouldn’t be enough bits to flip all the relevant parts of my alignment views, but it would be enough bits that I’d be a lot less certain and invest more in finding other ways to gain bits.
(Though obviously that depends somewhat on how I turn out to be wrong.)
Yeah, sounds about right.
I feel like I’d think less of John if it weren’t a crux for him? Like, one of the troubles with worldviews like this is they lean both on your theories and on your evidence, and so you really need to grab at the examples that do shine thru of “oh, my worldview found this belief of mine very confirming, but people disagree with it; we should figure out whether or not I’m right.”
I think it makes sense to have a loose probabilistic relationship. I do not think it makes sense for it to be a crux, in the sense of a thing which, if false, would make John abandon his view. There are just too many weak steps. The AI industry is not the AC industry. I happen to agree with John’s views about AC, but it’s not obvious to me that those views imply this particular test turning out as he’s predicting. (Is he averaging over the wrong points?) It’s more probable than not, but my point here is that the whole thing is made of fairly weak inferences.
To be clear, I am pro what John is doing and how he is engaging; it’s more John’s commentors who felt confusing to me.
I will personally be updating my priors depending on the results of this test. If it turns out that the AC is actually bad at its job, I will very slightly update towards being pessimistic about us catching failure modes of AGI before it’s too late. If, however, it turns out that it does not make a substantial difference, I will somewhat more strongly (though not very strongly) update towards being more concerned about us missing these sorts of things. One question I’m not sure how to answer is how (if at all) I should update based on the seemingly obvious cherry-picked example not being obvious at all.
For the record I have never personally bought an AC and am interested in getting a good recommendation soon :)
The argument as presented is:
System failures exist
Here is one example of a system failure
And the counterargument is: No, 2 is not an example of a system failure Therefore, I am not updating my prior for 1. because no new evidence has been presented