Not sure, but I have definitely noticed that llms have subtle “nuance sycophancy” for me. If I feel like there’s some crucial nuance missing I’ll sometimes ask and LLM in a way that tracks as first-order unbiased and get confirmation of my nuanced position. But at some point I noticed this in a situation where there were two opposing nuanced interpretations and tried modeling myself as asking “first-order-unbiased” questions having opposite views. And I got both views confirmed as expected. I’ve since been paranoid about this.
Generally I recommend this move of trying two opposing instances of “directional nuance” a few times. Basically I ask something like “the conventional view is X. Is the conventional view considered correct by modern historians?” Where X was formulated in a way that can naturally lead to a rebuttal Y. And then for sufficiently ambiguous and interpretation-dependent pairs of X and X’, with fully opposing “nuanced corrections” Y and ¬Y. I’ve been pretty successful at this several times I think
LLMs by default can easily be “nuance-brained”, eg if you ask Gemini for criticism of a post it can easily generate 10 plausible-enough reasons of why the argument is bad. But recent Claudes seem better at zeroing in on central errors.
Here’s an example of Gemini trying pretty hard and getting close enough to the error but not quite noticing it until I hinted it multiple times.
Not sure, but I have definitely noticed that llms have subtle “nuance sycophancy” for me. If I feel like there’s some crucial nuance missing I’ll sometimes ask and LLM in a way that tracks as first-order unbiased and get confirmation of my nuanced position. But at some point I noticed this in a situation where there were two opposing nuanced interpretations and tried modeling myself as asking “first-order-unbiased” questions having opposite views. And I got both views confirmed as expected. I’ve since been paranoid about this.
Generally I recommend this move of trying two opposing instances of “directional nuance” a few times. Basically I ask something like “the conventional view is X. Is the conventional view considered correct by modern historians?” Where X was formulated in a way that can naturally lead to a rebuttal Y. And then for sufficiently ambiguous and interpretation-dependent pairs of X and X’, with fully opposing “nuanced corrections” Y and ¬Y. I’ve been pretty successful at this several times I think
LLMs by default can easily be “nuance-brained”, eg if you ask Gemini for criticism of a post it can easily generate 10 plausible-enough reasons of why the argument is bad. But recent Claudes seem better at zeroing in on central errors.
Here’s an example of Gemini trying pretty hard and getting close enough to the error but not quite noticing it until I hinted it multiple times.