OpenAI published a paper showing that LLMs hallucinate for two primary reasons: first, due to statistical errors during training, and second, because LLM evaluations reward confident, even if incorrect, guesses but penalize “I don’t know”. So LLMs are trained to always guess when they aren’t sure.
The paper proposes that to fix this, we must modify LLM evaluations to properly reward models that acknowledge when they don't know the answer.