LLMs can teach themselves to better predict the future

LLMs can teach themselves to better predict the future—no human examples or curation required.

In this paper, we explore if AI can improve its forecasts via self-play and real-world outcomes:

- Dataset: 12,100 questions and outcomes from Polymarket (politics, sports, crypto, science, etc)
- Base model generates multiple distinct reasoning traces and predictions per question
- Rank predictions by how close they were to the actual outcome
- Fine-tune with DPO on the ranked traces & predictions

Result: +7-10% accuracy over control, bringing two small (14B) models on par with GPT-4o (over 10x larger).