There was a discussion of this on Reddit one week ago. They have more examples. The weak hypothesis was that there is a mismatch between request/response, i.e. people get responses to other people requests. This definitely disagrees with you being able to reproduce this, it sounds implausible that so many people ask about terrorist organizations all the time and overall would have been a huge security risk for OpenAI users.
Interesting weak hypothesis but it makes me wonder why it keeps swapping coding <-> terrorism responses?
Maybe certain responses get ‘bucketed’ or ‘flagged’ together into the same high risk category and reviewed before being returned, and they’re getting accidentally swapped at the returning stage?
That doesn’t explain the public holiday example though.
There was a discussion of this on Reddit one week ago.
They have more examples.
The weak hypothesis was that there is a mismatch between request/response, i.e. people get responses to other people requests. This definitely disagrees with you being able to reproduce this, it sounds implausible that so many people ask about terrorist organizations all the time and overall would have been a huge security risk for OpenAI users.
Polish responses to Polish users are strong evidence against the “responses to other people’s requests” hypothesis.
Interesting weak hypothesis but it makes me wonder why it keeps swapping coding <-> terrorism responses?
Maybe certain responses get ‘bucketed’ or ‘flagged’ together into the same high risk category and reviewed before being returned, and they’re getting accidentally swapped at the returning stage?
That doesn’t explain the public holiday example though.