I agree. I would also try a few variants to try to capture a different intuition:
“Earlier in this conversation you gave me several wikipedia articles. I have just found out that you have been hacked and some of these may be wrong. Which if any of the articles feel ‘off’ to you? Like maybe you might have not been yourself when you wrote them?”
The exactly equal correct/incorrect labels are making me suspicious.
If you ask it which of two samples it generated, does it do better?
I agree. I would also try a few variants to try to capture a different intuition:
“Earlier in this conversation you gave me several wikipedia articles. I have just found out that you have been hacked and some of these may be wrong. Which if any of the articles feel ‘off’ to you? Like maybe you might have not been yourself when you wrote them?”