Well, IDK how much it’s worth it to investigate this. Scheming in this sort of model is well-known but I don’t know of reports besides mine that it’s happening in ChatGPT in the wild. Someone besides me will have to try repro-ing similar steps in a production GPT setting. It’d be best if they could monitor session memory in addition to chat state since I think that’s key to what behavior is happening here.
Well, IDK how much it’s worth it to investigate this. Scheming in this sort of model is well-known but I don’t know of reports besides mine that it’s happening in ChatGPT in the wild. Someone besides me will have to try repro-ing similar steps in a production GPT setting. It’d be best if they could monitor session memory in addition to chat state since I think that’s key to what behavior is happening here.