Scor­ing Rules

TagLast edit: 18 Oct 2022 10:52 UTC by Nathan Young

Scoring Rules are ways to score answers on a test, a prediction, or any other performance.

Of special interest are proper scoring rules—rules such that the strategy for maximizing the expected score coincides with noting down your true beliefs about the question.

Forecasting rules and their flaws

Log Score

The Log score (sometimes called surprisal) is a strictly proper scoring rule[1] used to evaluate how good forecasts were. A forecaster scored by the log score will, in expectation, obtain the best score by providing a predictive distribution that is equal to the data-generating distribution. The log score therefore incentivizes forecasters to report their true belief about the future.

All Metaculus scores are types of log score[2].


The log score is usually computed as the negative logarithm of the predictive density evaluated at the observed value , log , where is the predicted probability density function. Usually, the natural logarithm is used, but the log score remains strictly proper for any base >1 used for the logarithm.

In the formulation presented above, the score is negatively oriented, meaning that smaller values are better. Sometimes the sign of the log score is inversed and it is simply given as the log predictive density. If this is the case, then larger values are better.

The log score is applicable to binary outcomes as well as discrete or continuous outcomes. In the case of binary outcomes, the formula above simplifies to


where is the probability assigned to the binary outcome . If a forecaster for example assigned 70% probability that team A would win a soccer match, then the resulting log score would be if team A wins and if team A doesn’t win.


Illustration of the difference between local and global scoring rules. Forecasters A and B both predicted the number of goals in a soccer match and assigned the same probability to the outcome that was later observed and therefore receive the same log score. Forecaster B, however, assigned a significant probability to outcomes far away from the observed outcome and therefore receives worse scores for the global scoring rules CRPS and DSS.

The log score is a local scoring rule, meaning that the score only depends on the probability (or probability density) assigned to the actually observed values. The score, therefore, does not depend on the probability (or probability density) assigned to values not observed. This is in contrast to so-called global proper scoring rules, which take the entire predictive distribution into account.

Penalization of Over- and Underconfidence

The log score penalizes overconfidence (i.e. a forecast that is too certain) stronger than underconfidence. While all proper scoring rules should incentivize the forecaster to predict their accurate true belief, forecasters may feel enticed to err on the side of caution when scored using the log score.

Therefore, the ″lower″ the Brier score is for a set of predictions, the ″better″ the predictions are calibrated. Note that the Brier score, in its most common formulation, takes on a value between zero and one, since this is the square of the largest possible difference between a predicted probability (which must be between zero and one) and the actual outcome (which can take on values of only 0 or 1). In the original (1950) formulation of the Brier score, the range is double, from zero to two.

The Brier score is appropriate for binary and categorical outcomes that can be structured as true or false, but it is inappropriate for ordinal variables which can take on three or more values.

Related Pages: Calibration, Forecasting & Prediction, Skill /​ Expertise Assessment, Prediction Markets

LMSR sub­sidy pa­ram­e­ter is the price of information

Abhimanyu Pallavi Sudhir25 May 2024 18:05 UTC
5 points
0 comments1 min readLW link

Stop-gra­di­ents lead to fixed point predictions

28 Jan 2023 22:47 UTC
36 points
2 comments24 min readLW link

Proper scor­ing rules don’t guaran­tee pre­dict­ing fixed points

16 Dec 2022 18:22 UTC
68 points
8 comments21 min readLW link

Bayes-Up: An App for Shar­ing Bayesian-MCQ

Louis Faucon6 Feb 2020 19:01 UTC
53 points
9 comments1 min readLW link

Bayesian examination

Lê Nguyên Hoang9 Dec 2019 19:50 UTC
86 points
56 comments5 min readLW link

A Proper Scor­ing Rule for Con­fi­dence Intervals

Scott Garrabrant13 Feb 2018 1:45 UTC
63 points
47 comments1 min readLW link

Au­mann Agree­ment Game

abramdemski9 Oct 2015 17:14 UTC
32 points
17 comments1 min readLW link

Alter­na­tive to Bayesian Score

Scott Garrabrant27 Jul 2013 19:26 UTC
14 points
30 comments3 min readLW link
No comments.