Martín Soto comments on The Mirror Trap

Martín Soto 8 Jun 2025 10:03 UTC
2 points
0
But the audience isn’t optimizing/Goodharting anything, just providing an imperfect proxy. It is only the artist who is argmaxing, which is when Goodhart appears.
One way out would be for the artist to stop optimizing for the audience, and start optimizing for real value. Another way out would be for the audience to perfect their assessment. But this is always the case for Goodhart: you can either stop using the proxy altogether, or improve your proxy.
Something more interesting would be “the artist is trying to create the art that elicits the best response, and the audience is trying to produce the response that makes the artist happiest”, or something like that. This is what happens when two people pleasers meet and they end up doing a plan that none of them wants. It’s also relevant to training an AI that’s alignment-faking. In a sense, the other trying to maximize your local utility dampens the signal you wanted to use to maximize global utility.