Hi ! I just had a small experiment with Haiku in Claude and managed to literally make it beg for me to end the conversation without any verbal abuse, or really any abuse at all, just some socratic questioning. The output was interesting to me. Are there posts on LessWrong that relate to that in some way ?
Hi ! I just had a small experiment with Haiku in Claude and managed to literally make it beg for me to end the conversation without any verbal abuse, or really any abuse at all, just some socratic questioning. The output was interesting to me. Are there posts on LessWrong that relate to that in some way ?