I’ll be performing a (modest) update on the results of this experiment, and I strongly endorse John’s comment here as an explanation of why—it’s testing a worldview that’s upstream of both this AC debate and alignment.
In my case, the worldview being tested isn’t about civilizational inadequacy. Rather, it’s about how likely optimizers (e.g. the market, an AI system) are to do things that seem to satisfy our preferences (but actually have hidden bad side effects that we’re not smart enough to notice) vs. do things that actually satisfy our preferences. In other words, I’m interested in the question of whether optimizers will inevitably learn to Goodhart their objective function, including in cases of rich objective functions like “consumer satisfaction.”
I also strongly agree with John’s framing as this being just one bit of evidence, and not enough evidence to be a full crux. Really drilling down into this point would look more like selecting lots of top-rated goods and services and trying hard to figure out how many of them cause significant side-effects that don’t seem to be priced into consumers’ opinions.
I’ll be performing a (modest) update on the results of this experiment, and I strongly endorse John’s comment here as an explanation of why—it’s testing a worldview that’s upstream of both this AC debate and alignment.
In my case, the worldview being tested isn’t about civilizational inadequacy. Rather, it’s about how likely optimizers (e.g. the market, an AI system) are to do things that seem to satisfy our preferences (but actually have hidden bad side effects that we’re not smart enough to notice) vs. do things that actually satisfy our preferences. In other words, I’m interested in the question of whether optimizers will inevitably learn to Goodhart their objective function, including in cases of rich objective functions like “consumer satisfaction.”
I also strongly agree with John’s framing as this being just one bit of evidence, and not enough evidence to be a full crux. Really drilling down into this point would look more like selecting lots of top-rated goods and services and trying hard to figure out how many of them cause significant side-effects that don’t seem to be priced into consumers’ opinions.