I think there’s a real disagreement between worldviews upstream of both air conditioners and alignment, and the AC thing is a real test between those worldviews. It wasn’t chosen to be a particularly optimal test, but finding good clean disagreements between worldviews does take some work and this opportunity just kind of dropped in front of us on a silver platter.
I’ll be performing a (modest) update on the results of this experiment, and I strongly endorse John’s comment here as an explanation of why—it’s testing a worldview that’s upstream of both this AC debate and alignment.
In my case, the worldview being tested isn’t about civilizational inadequacy. Rather, it’s about how likely optimizers (e.g. the market, an AI system) are to do things that seem to satisfy our preferences (but actually have hidden bad side effects that we’re not smart enough to notice) vs. do things that actually satisfy our preferences. In other words, I’m interested in the question of whether optimizers will inevitably learn to Goodhart their objective function, including in cases of rich objective functions like “consumer satisfaction.”
I also strongly agree with John’s framing as this being just one bit of evidence, and not enough evidence to be a full crux. Really drilling down into this point would look more like selecting lots of top-rated goods and services and trying hard to figure out how many of them cause significant side-effects that don’t seem to be priced into consumers’ opinions.
You mean the AC thing? If I’m wrong, it wouldn’t be enough bits to flip all the relevant parts of my alignment views, but it would be enough bits that I’d be a lot less certain and invest more in finding other ways to gain bits.
(Though obviously that depends somewhat on how I turn out to be wrong.)
I feel like I’d think less of John if it weren’t a crux for him? Like, one of the troubles with worldviews like this is they lean both on your theories and on your evidence, and so you really need to grab at the examples that do shine thru of “oh, my worldview found this belief of mine very confirming, but people disagree with it; we should figure out whether or not I’m right.”
I think it makes sense to have a loose probabilistic relationship. I do not think it makes sense for it to be a crux, in the sense of a thing which, if false, would make John abandon his view. There are just too many weak steps. The AI industry is not the AC industry. I happen to agree with John’s views about AC, but it’s not obvious to me that those views imply this particular test turning out as he’s predicting. (Is he averaging over the wrong points?) It’s more probable than not, but my point here is that the whole thing is made of fairly weak inferences.
To be clear, I am pro what John is doing and how he is engaging; it’s more John’s commentors who felt confusing to me.
I think there’s a real disagreement between worldviews upstream of both air conditioners and alignment, and the AC thing is a real test between those worldviews. It wasn’t chosen to be a particularly optimal test, but finding good clean disagreements between worldviews does take some work and this opportunity just kind of dropped in front of us on a silver platter.
I’ll be performing a (modest) update on the results of this experiment, and I strongly endorse John’s comment here as an explanation of why—it’s testing a worldview that’s upstream of both this AC debate and alignment.
In my case, the worldview being tested isn’t about civilizational inadequacy. Rather, it’s about how likely optimizers (e.g. the market, an AI system) are to do things that seem to satisfy our preferences (but actually have hidden bad side effects that we’re not smart enough to notice) vs. do things that actually satisfy our preferences. In other words, I’m interested in the question of whether optimizers will inevitably learn to Goodhart their objective function, including in cases of rich objective functions like “consumer satisfaction.”
I also strongly agree with John’s framing as this being just one bit of evidence, and not enough evidence to be a full crux. Really drilling down into this point would look more like selecting lots of top-rated goods and services and trying hard to figure out how many of them cause significant side-effects that don’t seem to be priced into consumers’ opinions.
Is it a crux for you? It feels to me like it shouldn’t be a crux for you.
You mean the AC thing? If I’m wrong, it wouldn’t be enough bits to flip all the relevant parts of my alignment views, but it would be enough bits that I’d be a lot less certain and invest more in finding other ways to gain bits.
(Though obviously that depends somewhat on how I turn out to be wrong.)
Yeah, sounds about right.
I feel like I’d think less of John if it weren’t a crux for him? Like, one of the troubles with worldviews like this is they lean both on your theories and on your evidence, and so you really need to grab at the examples that do shine thru of “oh, my worldview found this belief of mine very confirming, but people disagree with it; we should figure out whether or not I’m right.”
I think it makes sense to have a loose probabilistic relationship. I do not think it makes sense for it to be a crux, in the sense of a thing which, if false, would make John abandon his view. There are just too many weak steps. The AI industry is not the AC industry. I happen to agree with John’s views about AC, but it’s not obvious to me that those views imply this particular test turning out as he’s predicting. (Is he averaging over the wrong points?) It’s more probable than not, but my point here is that the whole thing is made of fairly weak inferences.
To be clear, I am pro what John is doing and how he is engaging; it’s more John’s commentors who felt confusing to me.