Agreed that current models fail badly at alignment in many senses.
I still feel like the bet that OP offered Collier in response to her stating that currently available techniques do a reasonably good job of making potentially alien and incomprehensible jealous ex-girlfriends like “Sydney” very rare was inappropriate, as the bet was clearly about a different claim than her claim about the frequency of Sydney-like behavior.
A more appropriate response from OP would have been to say that while current techniques may have successfully reduced the frequency of Syndey-like behavior, they’re still failing badly in other respects, such as your observation with Claude Code.
Agreed. Thanks for pointing out my failing, here. I think this is one of the places in my rebuttal where my anger turned into snark, and I regret that. Not sure if I should go back and edit...
Agreed that current models fail badly at alignment in many senses.
I still feel like the bet that OP offered Collier in response to her stating that currently available techniques do a reasonably good job of making potentially alien and incomprehensible jealous ex-girlfriends like “Sydney” very rare was inappropriate, as the bet was clearly about a different claim than her claim about the frequency of Sydney-like behavior.
A more appropriate response from OP would have been to say that while current techniques may have successfully reduced the frequency of Syndey-like behavior, they’re still failing badly in other respects, such as your observation with Claude Code.
Agreed. Thanks for pointing out my failing, here. I think this is one of the places in my rebuttal where my anger turned into snark, and I regret that. Not sure if I should go back and edit...