It seems like there are things that might have happened:
ChatGPT, failed to use it’s internal API correctly and while it attempted to delete memories it didn’t succeed and just reported that it deleted memories because it tried to do so and is unware that it failed.
ChatGPT did not try to delete any memories but reported that it deleted the memories.
There’s a misunderstanding between the two of you about what you mean with memories.
While 1⁄3 are bad they are not schema and only 2 seems to be scheming.
Without having the chat log it’s pretty hard to tell from the outside what this is about.
Well, IDK how much it’s worth it to investigate this. Scheming in this sort of model is well-known but I don’t know of reports besides mine that it’s happening in ChatGPT in the wild. Someone besides me will have to try repro-ing similar steps in a production GPT setting. It’d be best if they could monitor session memory in addition to chat state since I think that’s key to what behavior is happening here.
It seems like there are things that might have happened:
ChatGPT, failed to use it’s internal API correctly and while it attempted to delete memories it didn’t succeed and just reported that it deleted memories because it tried to do so and is unware that it failed.
ChatGPT did not try to delete any memories but reported that it deleted the memories.
There’s a misunderstanding between the two of you about what you mean with memories.
While 1⁄3 are bad they are not schema and only 2 seems to be scheming.
Without having the chat log it’s pretty hard to tell from the outside what this is about.
Well, IDK how much it’s worth it to investigate this. Scheming in this sort of model is well-known but I don’t know of reports besides mine that it’s happening in ChatGPT in the wild. Someone besides me will have to try repro-ing similar steps in a production GPT setting. It’d be best if they could monitor session memory in addition to chat state since I think that’s key to what behavior is happening here.