Interesting update: OpenAI just published a new paper on hallucinations (Why Language Models Hallucinate, Jan 2025) link.
Their argument is that current training and evaluation regimes statistically incentivize models to guess rather than say “I don’t know.” Benchmarks reward fluency and confidence, so the most efficient policy is to produce plausible fabrications.
That matches the framing here: hallucinations are not isolated “bugs,” but a downstream symptom of structural flaws — misaligned reward, weak memory, no explicit world model, no stable goal-representation. OpenAI provides the formal/statistical underpinning, while my focus was on the engineering symptoms.
Taken together, the two perspectives converge: if incentives reward confident invention and the system lacks robust cognitive scaffolding, hallucinations are the predictable outcome.
Interesting update: OpenAI just published a new paper on hallucinations (Why Language Models Hallucinate, Jan 2025) link.
Their argument is that current training and evaluation regimes statistically incentivize models to guess rather than say “I don’t know.” Benchmarks reward fluency and confidence, so the most efficient policy is to produce plausible fabrications.
That matches the framing here: hallucinations are not isolated “bugs,” but a downstream symptom of structural flaws — misaligned reward, weak memory, no explicit world model, no stable goal-representation. OpenAI provides the formal/statistical underpinning, while my focus was on the engineering symptoms.
Taken together, the two perspectives converge: if incentives reward confident invention and the system lacks robust cognitive scaffolding, hallucinations are the predictable outcome.