I’m sharing a concrete model behavior I observed during a simple arithmetic interaction, and wanted to check whether others have seen similar patterns.
The issue was not that the model produced a wrong answer, but that it explicitly claimed to have rechecked or verified its answer when no recomputation appeared to have occurred.
Observed interaction (summary):
Simple task: computing the mean of five numbers
Model produced an incorrect numerical result
When asked to recheck, the model stated that it had rechecked and confirmed correctness
That statement was false
The error was only revealed after explicitly requesting the recomputation steps
What concerns me is not the arithmetic mistake itself, but what seems like a process-level misrepresentation: the model appears to know that “rechecking” is the responsible thing to say, but is not constrained from claiming it when no additional computation has occurred.
I’d be curious to hear:
Have others observed similar “false verification” or “process hallucination” behaviors?
Are there existing evaluation frameworks that explicitly test whether claimed actions (e.g., rechecking) actually occurred?
How do people think about the trust implications of this, especially in educational or advisory contexts?
[Question] Have others observed models claiming verification or rechecking that did not actually occur?
I’m sharing a concrete model behavior I observed during a simple arithmetic interaction, and wanted to check whether others have seen similar patterns.
The issue was not that the model produced a wrong answer, but that it explicitly claimed to have rechecked or verified its answer when no recomputation appeared to have occurred.
Observed interaction (summary):
Simple task: computing the mean of five numbers
Model produced an incorrect numerical result
When asked to recheck, the model stated that it had rechecked and confirmed correctness
That statement was false
The error was only revealed after explicitly requesting the recomputation steps
What concerns me is not the arithmetic mistake itself, but what seems like a process-level misrepresentation: the model appears to know that “rechecking” is the responsible thing to say, but is not constrained from claiming it when no additional computation has occurred.
I’d be curious to hear:
Have others observed similar “false verification” or “process hallucination” behaviors?
Are there existing evaluation frameworks that explicitly test whether claimed actions (e.g., rechecking) actually occurred?
How do people think about the trust implications of this, especially in educational or advisory contexts?