It’s incredibly surprising that state-of-the-art AI don’t fix most of their hallucinations despite being capable (and undergoing reinforcement learning).
Is the root cause of hallucination alignment rather than capabilities?!
Maybe the AI gets a better RL reward if it hallucinates (instead of giving less info), because users are unable to catch its mistakes.
It’s incredibly surprising that state-of-the-art AI don’t fix most of their hallucinations despite being capable (and undergoing reinforcement learning).
Is the root cause of hallucination alignment rather than capabilities?!
Maybe the AI gets a better RL reward if it hallucinates (instead of giving less info), because users are unable to catch its mistakes.