Thanks for checking! I think our main difference is that you use data from Metaculus prediction whereas I used Metaculus postdiction, which “uses data from all other questions to calibrate its result, even questions that resolved later.” Right now, this gives Metaculus an average log score of 0.519 vs. the community’s 0.419 (total questions: 885) for binary questions, 2.43 vs. 2.25 for 537 continuous questions, evaluated at resolve time.
Thanks for checking! I think our main difference is that you use data from Metaculus prediction whereas I used Metaculus postdiction, which “uses data from all other questions to calibrate its result, even questions that resolved later.” Right now, this gives Metaculus an average log score of 0.519 vs. the community’s 0.419 (total questions: 885) for binary questions, 2.43 vs. 2.25 for 537 continuous questions, evaluated at resolve time.