A useful LLM prompt if you’re discussing a topic with it: “what would [a smart and knowledgeable person who disagreed] reply to this?”
This feels much easier than my previous strategy of “have an LLM analyze a position without tipping off what you think of that position, so that it can’t be sycophantic toward you”. Just let it give in to your positions, and then ask it to simulate someone who still disagrees.
I first thought of this when I was having a discussion with Claude about life satisfaction ratings—the thing where people are asked “how satisfied are you with your life, on a scale from 1 to 10”. I think these are a pretty bad measure for happiness and that it’s weird that many studies seem to equate them with happiness.
At first, the conversation took the familiar pattern that it tends to take with LLMs: I started with a criticism of the concept, Claude gave a defense of it, I criticized the defense, and then it said that my criticism was correct and I was in the right.
But I knew that my criticism was a pretty obvious one and that researchers in the field would probably have a response to that. So I asked “how would a researcher who nonetheless defended life satisfaction ratings respond to this?”, and it gave me an answer that did change my mind on some points!
Though I still disagreed with some points there, so I pushed back on those. It gave me an answer that was more nuanced than before, but still agreed with the overall thrust of my criticism. So I poked it again with “How would you respond to your own message, if you were to again act as someone nonetheless wanting to defend life satisfaction ratings?”.
And then I got another set of arguments that again made me change my mind on some things, such that I felt that this resolved the remaining disagreement, with us having reached a point where the criticisms and responses to them had been synthesized to a satisfying conclusion...
...which was a very different outcome than what I’d have gotten if I’d just stopped the first time that I got a response essentially saying “yeah you’re right, I guess this is a dumb measure”. Instead of stopping at my antithesis, we actually got to a synthesis.
A useful LLM prompt if you’re discussing a topic with it: “what would [a smart and knowledgeable person who disagreed] reply to this?”
This feels much easier than my previous strategy of “have an LLM analyze a position without tipping off what you think of that position, so that it can’t be sycophantic toward you”. Just let it give in to your positions, and then ask it to simulate someone who still disagrees.
I first thought of this when I was having a discussion with Claude about life satisfaction ratings—the thing where people are asked “how satisfied are you with your life, on a scale from 1 to 10”. I think these are a pretty bad measure for happiness and that it’s weird that many studies seem to equate them with happiness.
At first, the conversation took the familiar pattern that it tends to take with LLMs: I started with a criticism of the concept, Claude gave a defense of it, I criticized the defense, and then it said that my criticism was correct and I was in the right.
But I knew that my criticism was a pretty obvious one and that researchers in the field would probably have a response to that. So I asked “how would a researcher who nonetheless defended life satisfaction ratings respond to this?”, and it gave me an answer that did change my mind on some points!
Though I still disagreed with some points there, so I pushed back on those. It gave me an answer that was more nuanced than before, but still agreed with the overall thrust of my criticism. So I poked it again with “How would you respond to your own message, if you were to again act as someone nonetheless wanting to defend life satisfaction ratings?”.
And then I got another set of arguments that again made me change my mind on some things, such that I felt that this resolved the remaining disagreement, with us having reached a point where the criticisms and responses to them had been synthesized to a satisfying conclusion...
...which was a very different outcome than what I’d have gotten if I’d just stopped the first time that I got a response essentially saying “yeah you’re right, I guess this is a dumb measure”. Instead of stopping at my antithesis, we actually got to a synthesis.