The “coupling” idea is pretty interesting, but when I think about it, I somewhat doubt it would work any better than regular old constitution training. Like, the constitution basically just associates the easy-to-train trait “believing you are Claude” with hard-to-train traits like “being nice to the user.” Do you think the approach you’re suggesting could do better than that?
Oh, I totally agree! IMO constitutional AI is one of the primary examples of coupling, and I’m not actually suggesting a different technique. I’m just saying that constitutional AI hasn’t been studied very systematically from the angle of “how much is this actually helping” or “for what kinds of traits is it effective”.
The “coupling” idea is pretty interesting, but when I think about it, I somewhat doubt it would work any better than regular old constitution training. Like, the constitution basically just associates the easy-to-train trait “believing you are Claude” with hard-to-train traits like “being nice to the user.” Do you think the approach you’re suggesting could do better than that?
Oh, I totally agree! IMO constitutional AI is one of the primary examples of coupling, and I’m not actually suggesting a different technique. I’m just saying that constitutional AI hasn’t been studied very systematically from the angle of “how much is this actually helping” or “for what kinds of traits is it effective”.