What I meant was, Sonnet is smart enough to know the difference between text it generated and text a different dumber model generated. So if you feed it a conversation with Opus as if it were its own, and ask it to continue, it notices & says “JFYI I’m a different AI bro”
What I meant was, Sonnet is smart enough to know the difference between text it generated and text a different dumber model generated. So if you feed it a conversation with Opus as if it were its own, and ask it to continue, it notices & says “JFYI I’m a different AI bro”
E.g. demonstrated here https://www.lesswrong.com/posts/ADrTuuus6JsQr5CSi/investigating-the-ability-of-llms-to-recognize-their-own