However, I also held similar follow-up chats with Claude 3 Opus at temperature 0, and Claude 3.5 Sonnet, each of which showed different patterns.
To make sure I understand: You took a chat log from your interaction with 3 Opus, and then had 3.5 Sonnet continue it? This would explain Sonnet’s reaction below!
What I meant was, Sonnet is smart enough to know the difference between text it generated and text a different dumber model generated. So if you feed it a conversation with Opus as if it were its own, and ask it to continue, it notices & says “JFYI I’m a different AI bro”
To make sure I understand: You took a chat log from your interaction with 3 Opus, and then had 3.5 Sonnet continue it? This would explain Sonnet’s reaction below!
Yes, that is an accurate understanding.
(Not sure what you mean by “This would explain Sonnet’s reaction below!”)
What I meant was, Sonnet is smart enough to know the difference between text it generated and text a different dumber model generated. So if you feed it a conversation with Opus as if it were its own, and ask it to continue, it notices & says “JFYI I’m a different AI bro”
E.g. demonstrated here https://www.lesswrong.com/posts/ADrTuuus6JsQr5CSi/investigating-the-ability-of-llms-to-recognize-their-own