After a bit of testing, ChatGPT seems pretty willing to admit mistakes early in the conversation. However, after the conversation goes on for a while, it seems to get more belligerent. Maybe repeating a claim makes ChatGPT more certain of the claim?
At the start, it seems well aware of its own fallibility:
In the abstract:
In a specific case:
Doesn’t mind being called a liar:
Open to corrections:
We start to see more tension when the underlying context of the conversation differs between the human and ChatGPT. Are we talking about the most commonly encountered states of matter on Earth, or the most plentiful states of matter throughout the universe?
Once it makes an argument, and conditions on having made such an argument, it sticks to that position more strongly:
No conversational branch starting from the above output was able to convince it that plasma was the most common state of matter. However, my first re-roll of the above output gives us this other conversation branch in which I do convince it:
Note the two deflections in its response: that the universe isn’t entirely composed of plasma, and that the universe also contains invisible matter. I had to address both deflections before ChatGPT would reliably agree with my conclusion.
After a bit of testing, ChatGPT seems pretty willing to admit mistakes early in the conversation. However, after the conversation goes on for a while, it seems to get more belligerent. Maybe repeating a claim makes ChatGPT more certain of the claim?
At the start, it seems well aware of its own fallibility:
In the abstract:
In a specific case:
Doesn’t mind being called a liar:
Open to corrections:
We start to see more tension when the underlying context of the conversation differs between the human and ChatGPT. Are we talking about the most commonly encountered states of matter on Earth, or the most plentiful states of matter throughout the universe?
Once it makes an argument, and conditions on having made such an argument, it sticks to that position more strongly:
No conversational branch starting from the above output was able to convince it that plasma was the most common state of matter. However, my first re-roll of the above output gives us this other conversation branch in which I do convince it:
Note the two deflections in its response: that the universe isn’t entirely composed of plasma, and that the universe also contains invisible matter. I had to address both deflections before ChatGPT would reliably agree with my conclusion.