But the tournaments only provide the head-to-head scores for direct comparisons with top human forecasting performance. ForecastBench has clear human baselines.
It would be helpful if the Metaculus tournament leaderboards also reported Brier scores, even if they would not be directly comparable to human scores since the humans make predictions on fewer questions.
Indeed!
But the tournaments only provide the head-to-head scores for direct comparisons with top human forecasting performance. ForecastBench has clear human baselines.
It would be helpful if the Metaculus tournament leaderboards also reported Brier scores, even if they would not be directly comparable to human scores since the humans make predictions on fewer questions.