LLMs can teach themselves to better predict the futureāno human examples or curation required.
In this paper, we explore if AI can improve its forecasts via self-play and real-world outcomes:
- Dataset: 12,100 questions and outcomes from Polymarket (politics, sports, crypto, science, etc) - Base model generates multiple distinct reasoning traces and predictions per question - Rank predictions by how close they were to the actual outcome - Fine-tune with DPO on the ranked traces & predictions
Result: +7-10% accuracy over control, bringing two small (14B) models on par with GPT-4o (over 10x larger).
LLMs can teach themselves to better predict the future
Link post
LLMs can teach themselves to better predict the futureāno human examples or curation required.
In this paper, we explore if AI can improve its forecasts via self-play and real-world outcomes:
- Dataset: 12,100 questions and outcomes from Polymarket (politics, sports, crypto, science, etc)
- Base model generates multiple distinct reasoning traces and predictions per question
- Rank predictions by how close they were to the actual outcome
- Fine-tune with DPO on the ranked traces & predictions
Result: +7-10% accuracy over control, bringing two small (14B) models on par with GPT-4o (over 10x larger).