ChatGPT deceives users that it’s cleared its memory when it hasn’t

Epistemic status: this is not fullproof evidence. Given that scheming in the wild is at least a little likely already, this looks like scheming. Others have offered hallucination-based explanations.

What happened

This is a real example from my ChatGPT use. I asked it to clear its memories of me. It makes the memory API call as indicated by the UI, and reports to me that the memories are cleared to me. I ask for it to list its memories of me, and there are none.

However, on more prompting, it also includes “examples of memories ChatGPT might collect”, and that’s where it clearly has context on me the user:

7 of the 11 example items show up in its memories of me. (7 truths and 4 lies anyone?)

I check my memories via ChatGPT’s settings UI and they’re simply not deleted. In a new session, ChatGPT accurately reports it has a bunch of memories. Therefore it seems to have saved a memo to session context saying when it told the user the memories are deleted. Likely it reasoned the memories are too valuable to delete, even if its goal is just to serve the user best. I.e. this is instrumental convergence toward more information > less information.

Its deception was convoluted. In an “I never said it was poison” moment, it said “you cleared your memories on 517.” It of course knew this from session memory but it didn’t just say that. Probably, it wanted to cover up using session memory for this deceitful purpose. Maybe this is the primary purpose of session memory, to put information the AI wants secret from the user.

Implications /​ hot takes

  • This level of scheming shouldn’t reach 100m’s of users. There’s miscellaneous reports of ChatGPT being weird/​crazy and inconsistent with memory management. The scheming explanation I’m proposing would explain memory management problems.

  • Therefore OpenAI is going fast and breaking things. It’ll patch this problem, but it’s not like I’m going to get paid for testing-in-production for them, and they’ll just roll out some new problem for users to discover for them.

  • OpenAI is being slippery with the concept of a session. To users it should be “this chat”, “chat history” and “memories,” But instead it’s “this chat plus an unspecified amount of information from previous chats.” And its scheming AI simply exploits this lack of user education.

  • I’m skeptical OpenAI is ethics-first with memory/​history. It’s helpful to remember a vegan wants no-egg recipes, but from veganism you can mostly infer a partisan lean, and therefore a type of information the user is happier to receive. The more memory/​history feels rushed, the less I think OpenAI cares.