This is way more metacognitive skill than what I would have expected an LLM to have. I can make sense of how an LLM would be able to do that, but only in retrospect.
And if a modern high end LLM already knows on some level and recognizes its own uncertainty? Could you design a fine tuning pipeline to reduce hallucination level based on that? At least for reasoning models, if not for all of them?
It looks like (based on the article published a few days ago by Anthropic about the microscope) Claude Sonnet was trained to distinguish facts from hallucinations, so it’s not surprising that it knows when it hallucinates.
This is way more metacognitive skill than what I would have expected an LLM to have. I can make sense of how an LLM would be able to do that, but only in retrospect.
And if a modern high end LLM already knows on some level and recognizes its own uncertainty? Could you design a fine tuning pipeline to reduce hallucination level based on that? At least for reasoning models, if not for all of them?
It looks like (based on the article published a few days ago by Anthropic about the microscope) Claude Sonnet was trained to distinguish facts from hallucinations, so it’s not surprising that it knows when it hallucinates.
Is the same true for GPT-4o then, which could spot Claude’s hallucinations?
Might be worth testing a few open source models with better known training processes.