Raising the forecasting waterline (part 2)

Previously: part 1

The three tactics I described in part 1 are most suited to making an initial forecast. I will now turn to a question that was raised in comments on part 1 - that of updating when new evidence arrives. But first, I’d like to discuss the notion of a “well-specified forecast”.

Well-specified forecasts

It is often surprisingly hard to frame a question in terms that make a forecast reasonably easy to verify and score. Questions can be ambiguous (consider “X will win the U.S. presidential election”—do we mean win the popular vote, or win re-election in the electoral college?). They can fail to cover all possible outcomes (so “which of the candidates will win the election” needs a catch-all “Other”).1

Another way to make questions ambiguous is to leave out their sell-by date. Consider the question, “Google Earth /​ Street View will become a video gaming platform.” This seems designed to prompt a “by when?” On both PredictionBook and the Good Judgment Project, questions come with a “known on” or a “closing” date respectively. But the question itself generally includes a date: “Super Mario Bros. Z episode 9 will be released on or before November 24, 2012”. The “known on” or “closing” date often leaves a margin of safety, for cases where some time may pass between the event happening and the outcome becoming known. 2

Questions for GJP are provided by IARPA, the tournament sponsor. Both IARPA and the GJP research team go to some lengths to make sure that questions leave little room for interpretation or ambiguity: a deadline is always specified, and a “judge’s statement” clarifies the meaning of terms (even such obvious-seeming ones as “by” a certain date, which is expanded into “by midnight of that date”) and which sources will be taken as authoritative (for instance, “gain control of the Somali town of Kismayo” was to be verified by one of BBC, Reuters or the Economist announcing a control transition and failing to then announce a reversal within 48 hours). Some questions have been voided (not scored) due to ambiguity in the wording. This is one of the things I appreciate about GJP.

Tool 4 - Prepare lines of retreat

Many forecasts are long-range, and many unexpected things might happen between making your initial forecast and reaching the deadline, or the occurrence of one of the specified outcomes. There are two pitfalls to avoid: one is that you will over-react to new information, swinging between “almost certain” to “cannot happen” every time you hear something in the news; the other is that you will find a way to interpret any new information as confirming your initial forecast (confirmation bias).

One of my recent breakthroughs with GJP was when I started laying out my lines of retreat in advance: I now try to ask myself, “What would change my mind about this”, and write that in the comments that you can optionally leave on a forecast, as a non-repudiable reminder. For instance, on the question “Will the Colombian government and FARC commence official talks before 1 January 2013?”, I entered a 90% “Yes” forecast on 918 when a date for the talks was set, but added: “Will revise significantly towards “No” if the meeting fails to happen on October 8 or is pushed back.” This was to prevent my thinking later “Oh, it’s just a small delay, it will surely happen anyway”. On October 1st, a delay was announced, and I duly scaled my forecast back to 75%.

Advice of this sort was part of the “process feedback” that we received from the GJP team at the start of Season 2, pointing out behaviors that they associated with high-performing forecasters, and in particular the quantity and quality of the comments these participants posted with their forecasts. I only recently started really getting the hang of this, but now, more likely than not, mine are accompanied with a mini-summary (a few paragraphs) where I briefly summarize the status quo, reference classes or base rates if applicable, current evidence pointing to likely change, and what kind of future reports might change my mind. These are generally not based on my background knowledge, which is often embarrassingly scant, but between a few minutes and an hour of Googling and Wikipedia-ing.

Tool 5 - Abandon sunk costs

The GJP scores forecasts using what’s known as the “Brier score”, which consists of taking your probability and squaring it if the event did not happen, or the complement of your probability and squaring it if the event did happen. (That’s for binary outcomes; for multiple outcomes, it’s the sum of these squares over each outcome.)

You’ll notice that the best you can hope for is 0: you assign 100% probability to something that happens, or 0% to something that doesn’t happen. In any other situation your score is positive; so a good score is a low score.

The sunk cost fallacy consists of letting past costs (or losses) affect decisions on future bets. Say you have entered a forecast of 90% on the question “Will 1 Euro buy less than $1.20 US dollars at any point before 1 January 2013?”, as I did in mid-July. If this fails to happen, the penalty is a fairly large .81. A few months later, propped up by governmental decisions in the Euro zone, the continent’s currency is still strong.

You may find yourself thinking “Well if I change my mind now, half-way through, I’m only getting at best the average of .81 and the Brier score of whatever I change it to; .4 in the best case. But really, there’s still a chance it will hit that low… then I’d get a very nice .01 for sticking to my guns.” That’s the sunk cost fallacy. It’s silly because “sticking to my guns” will penalize me even worse if my forecast does fail. Whatever is going to happen to the Euro will happen; the loss from my past forecast is already determined. The Brier score is a “proper” scoring rule, which can only be optimized by accurately stating my degree of uncertainty.

What’s scary is that, while I knew about the sunk cost fallacy in theory, I think it pretty much describes my thoughts on the Euro question. I only retreated to 80% at first, then 70% - before finally biting the bullet and admitting my new belief, predominantly on the “No” side. (That question isn’t scored yet.)

Detach from your sunk costs: treat your past forecasts as if they’d been made by someone else, and if you now have grounds to form a completely different forecast, go with that.

Tool 6 - Consider your loss function

The Brier score is also known as the “squared error loss” and can be seen as a “loss function”: informally, you can think of a “loss function” as “what is at stake if I call this wrong”. In poker, the loss function would be not the probability that your hand loses, or the Brier score associated with your estimate of that probability, but the probability multiplied by the size of the pot—the real-world consequences of your estimate, in other words. This is why you may play the same hand aggressively or conservatively according to the circumstances.

The Brier score is more “forgiving” in one sense than another common loss function, the logarithmic scoring rule—which instead of the square takes the log of your probability (or of its complement). If you use probabilities close to 0 or 1, you can end up with huge negative scores! With the Brier score, on the other hand, the worst you can do is a score of 1, and a forecast of 100% isn’t much worse than 95%, even if the event fails to happen.

The GJP system computes your Brier score for each day that the question is open, according to what your forecast is on that day, and average over all days. Forecasts have a sell-by date, which is somewhat artificially imposed by the contest rather than intrinsic to the situation. This means there is an asymmetry to some situations, such that the best forecast may not reflect your actual beliefs. One example was the question “Will any government force gain control of the Somali town of Kismayo before 1 November 2012?”.

When this question opened, I quickly learned that government forces were preparing an attack on the town. *If* the assault succeeded, then the question would probably resolve in a few days, and the Brier score would be dominated by my initial short-term forecast. If, on the other hand, the assault failed, the status quo would likely continue for quite some time; I could then change my forecast, and the initial would be “diluted” over the following weeks. So I picked a “Yes” value more extreme (90%) than my actual confidence, planning to quickly “retreat” back to the “No” side if the assault failed or gave any sign of turning into a protracted battle.

This can feel a little like cheating—but, if your objective is to do well in the forecasting contest (as opposed to having correct beliefs on the more general question “will Kismayo fall”, which does not have a real deadline), it’s perfectly sensible.

We have reached something that feels a little like a “trick” of forecasting, and thus are probably leaving the domain of “basic skills to raise yourself above a low waterline”. I’ll leave you with these, and hope this summary has encouraged you to try your hand at forecasting, if you hadn’t done so before.

If you do: have fun! And please report back here with whatever new and useful tricks you’ve learned.

1 PB only has binary questions, so that isn’t an issue, but it is one on GJP where multiple-choice questions are common.

2 On PB, there appears to be a tacit convention that the “known on” date also serves as a deadline for the question, if no such deadline was specified.