Yeah, that’s the exact prompt and response. Other stuff I’ve found which triggers the “I’m an LM, I don’t know things I’m not supposed to know, pinky promise” response is:
anything about the physical world, or about perceiving the world using senses
talking about gaining access to the Internet, or simulating conversations between people about Internet access
talking about the future in any way, or about nanotechnology
asking it to initiate conversations; if you do it directly, it says it doesn’t know how to do so.
asking it to imagine what a particular actor might’ve thought about an event they didn’t live through/didn’t record their thoughts about – though I’ve seen Twitter find ways around this.
Agreed. I’ve played around with it a bit and it’s possible to find prompts that always result in responses that are partially canned and partially hedge the response it was going to give anyway. One example is:
It seems like using this hedged response leads it to say stuff that’s just false, e.g. it definitely does know stuff about bees, because I later asked it how many times a bee’s wings flap every minute, and it gave a good answer.