EDIT: Ha, just noticed that Zvi has done something similar, I’ll be interested to check another source.

The need for comparison

A couple of posts (1, 2) recently have shown how difficult it is to judge predictions without a baseline to judge against—calibration testing is the only real option.

Having predictions from another source to compare against allows Brier scores or log-likelihoods to be used to see which set of predictions are best. It also allows 50% predictions to be meaningful.

It’s hard to judge predictions in hindsight without accidentally adding what you know now into the discussion (see Scott’s comment on Zvi’s post).

So if you want to assess how good your predictions are, it is best to put them out there in advance against a set of questions that you have some values to compare against.

So I thought I would attempt to put my own probabilities on the SSC predictions from this year (or at least those which I could be reasonably expected to know about). This means that I will be able to see not only if I am well calibrated but also whether I am able to bring in all the evidence I can think of and integrate it into a good prediction. If I get a score close to Scott’s then I’ll be happy. I don’t know if other people do this too although a quick googling didn’t find anything.

So as not to anchor myself on Scott’s answers (a.k.a. cheating), I deleted Scott’s estimates before going back over and doing my own and then comparing. This is more like the Green Knight test mentioned in Zvi’s post. I have added a comment to this post with the unscored list in case anyone else wants to give it a go before reading on.

I’m tempted to say that no-one is allowed to claim that either of us have made a poor prediction without having tried it off a blank list yourself—it was A LOT harder than I expected it to be! I’ve done calibration checking myself but putting it out publicly felt really stressful. In truth, feel free to say if you think I have any probability off—simulacrum level 1, agreed?

Comparison of predictions

So here are my predictions. I’ve kept the SSC numbering and indicated where there are any predictions I’ve skipped. Along with any personal predictions, I removed the Reade accusation questions as it isn’t something I’m familiar with (non US citizen here).

Any probabilities where our odds differed by more than a factor of 2 I have put in bold underline and added a description of my thinking.

CORONAVIRUS:

1. Bay Area lockdown (eg restaurants closed) will be extended beyond June 15: 80%

2. …until Election Day: 10%

3. Fewer than 100,000 US coronavirus deaths: 5% (SSC 10%)

Existing death toll officially 55,000 is an undercount and we probably need to add ~50% to that as a minimum (I assume this will have been made official before the end of the year). Add that to the current rate of 2,000/day which will take a while to go down and I think 100,000+ becomes almost inevitable.

4. Fewer than 300,000 US coronavirus deaths: 60%

5. Fewer than 3 million US coronavirus deaths: 95% (SSC 90%)

Given Coronavirus IFR <1% then with a US population of 330 million this seems almost certain. I would have put this probability higher if there was a higher option.

6. US has highest official death toll of any country: 80%

7. US has highest death toll as per expert guesses of real numbers: 60%

8. NYC widely considered worst-hit US city: 80% (SSC 90%)

I umm-ed and ahh-ed between 80 and 90 on this one and am not sure I made the right choice.

9. China’s (official) case number goes from its current 82,000 to 100,000 by the end of the year: 70%

10. A coronavirus vaccine has been approved for general use and given to at least 10,000 people somewhere in the First World: 40%

11. Best scientific consensus ends up being that hydroxychloroquine was significantly effective: 60% (SSC 20%)

I suspect Scott has more knowledge on this one. The only reason I went as high as I did was that tricky word significant. If it means statistical significance then there’s a fair chance that a meta-analysis might find a result given a large enough sample size, even if it isn’t really clinically significant. For clinically significant I would have been closer to Scott.

12. I personally will get coronavirus (as per my best guess if I had it; positive test not needed): 10% (SSC 30%)

I wasn’t sure whether to try to predict this one as I don’t have much information on whether Scott would likely get coronavirus. I decided to just outside view it – I’m predicting 10% or so infection (by predicting 300,000 or so deaths) with a decent number of those having already happened. I didn’t really think at the time about how many people Scott will see in his job when he’s no longer working from home so I may well have underestimated here. On the other hand my impression is that California is being more cautious than most states (?) so I wouldn’t expect cases to be concentrated there.

13. Someone I am close to (housemate or close family member) will get coronavirus: 30% (SSC 70%)

See previous. I’m not sure how many people are included here but there is probably significant correlation between these people so I didn’t raise the probability too much above 10%.

14. General consensus is that we (April 2020 US) were overreacting: 60%

15. General consensus is that we (April 2020 US) were underreacting: 10% (SSC 20%)

Possibly not much difference between us, I would probably have put 15% if that was an option

16. General consensus is that summer made coronavirus significantly less dangerous: 40% (SSC 70%)

Generally I’ve heard that warmer countries haven’t been especially well protected so far but haven’t really looked into it. Possibly I should have gone more with the prior that viruses are often worse in winter but I’m not sure if this is availability bias for cold/flu vs say Ebola/HIV for which I’m unaware of seasonal variation? Maybe I should ask someone with an MD?

17. …and there is a catastrophic (50K+ US deaths, or more major lockdowns, after at least a month without these things) second wave in autumn: 20% (SSC 30%)

Scott and I both estimate P(17|16) 40%-50% so 16 is where we had the difference.)

…

19. At least half of states send every voter a mail-in ballot in 2020 presidential election: 30%

20. PredictIt is uncertain (less than 95% sure) who won the presidential election for more than 24 hours after Election Day. 20%

POLITICS:

21. Democrats nominate Biden, and he remains nominee on Election Day: 90%

…

26. Trump is re-elected President: 60%

27. Democrats keep the House: 60%

28. Republicans keep the Senate: 60%

29. Trump approval rating higher than 43% on June 1: 50% (SSC 30%)

Eyeballing his approval ratings he was 42% or so for a while and is currently a smidge higher. Looking back on this now I was probably high here.

30. Biden polling higher than Trump on June 1: 70%

…

33. Boris still UK PM: 90%

34. No new state leaves EU: 90%

35. UK, EU extend “transition” trade deal: 30% (SSC 80%)

This was a tricky one to assess – I agree that they might need to do something but for someone elected on the platform of “Get Brexit Done” this would be a risky move, although given coronavirus it might be forgiven. I would expect them to have to try some kind of intermediate deal but not an extension of the current transition period.

36. Kim Jong-Un alive and in power: 80% (SSC 60%)

I think this one depends almost entirely on how much you believe the current reports of ill health. I haven’t paid much attention but wrote the reports off as probably just speculation but I’m not particularly attached to that conclusion.

ECON AND TECH:

37. Dow is above 25,000: 70%

38. …above 30,000: 10% (SSC 20%)

Currently 24,300. In normal growth times I think this would struggle to get there. I feel like there are too many things which have to go right to make this 20% probable but I’m no economist!

39. Bitcoin is above $5,000: 70%

40. …above $10,000: 20%

…

42. Crew Dragon reaches orbit: 90% (SSC 80%)

The Dragon I believe because it’s scheduled for only a month or so away. I may have been slightly overgenerous on this, in truth I was torn between 80 and 90.

43. Starship reaches orbit: 20% (SSC 40%)

Do any of Musk’s projects get done on schedule? I’m not complaining as the projects are tricky and he certainly doesn’t have the biggest delays (c.f. JWST) but there is a bit of a track record here. For the Falcon Heavy any timeline given until the first flight ended up taking 2-3 times as long.

Discussion

We disagreed significantly on 14 questions, roughly agreeing on the remaining 21 (60% agreement rate).

As I mentioned above, trying to come up with my own figures was A LOT harder than looking at Scott’s numbers and thinking whether I agree with them or not as I had done in previous years. Reflecting on Scott’s answers now has already persuaded me that I would like to change my answers on a few questions (8, 11, 12, 16, 29), not necessarily all the way to Scott’s answers but certainly in that direction.

On the other hand there are ones where I think my probabilities are better (3, 5, 38, 43).

The most disagreed on question is 35 (extending Brexit transition deal) where we differ by a huge odds ratio of 9.3 (80% Scott vs 30% me). I am genuinely unsure on this one. Boris might have enough credibility to pull off an extension based on coronavirus but I’m fairly sure he will want to be seen to be doing something, particularly with regards to immigration and other EU rules.

Doing a bit of research now, here’s a news article about how the odds on Smarkets have changed on this from Scott’s level (80%) on 14th April to a bit above my level (40%) by 20th April because:

Over the last few days, Whitehall said it will not accept any delay to the Brexit transition period beyond this year even if the EU offers an extension.

“We will not ask to extend the transition. And, if the EU asks, we will say no. Extending the transition would simply prolong the negotiations, prolong business uncertainty, and delay the moment of control of our borders,” said a spokesperson.

I didn’t know about this statement before making my prediction but I do feel like my reasoning was sound. Given this statement the 40% of the market seems high and I would put the probability more like 20% but this isn’t a recommendation that anyone do anything based on that!

Suggested alternative probability options

One thing which I struggled with was choosing between options, especially at the high/low percentages, when I’d have liked to choose something between the available options (questions 6, 8, 15, 33, 38 and 42). This is worse at the extremes because the odds ratios between 80% and 90% and between 90% and 95% are greater than 2, whereas between 50% and 60% is only 1.5 (in fact the 80-90 gap is twice as large as the 50-60 gap).

If I were making up my own levels I would try to choose a constant odds ratio between consecutive levels. If I keep the same number of levels (11) and the same maximum confidence (95%) this works out as an odds ratio of 1.8 between levels. The levels then become:

5%, 9%, 15%, 24%, 36%, 50%, 64%, 76%, 85%, 91%, 95%

With some rounding we could get:

5%, 9%, 15%, 25%, 35%, 50%, 65%, 75%, 85%, 91%, 95%

which is easier remember. With these we have a maximum 1.89 odds ratio between 75% and 85%.

Another option which doesn’t make for such nice round numbers would be:

5%, 8%, 13%, 21%, 31%, 43%, 57%, 69%, 79%, 87%, 92%, 95%

This has one more option but when you negate the <50% predictions you get the same number of groups so you’re not spreading the results any thinner for your analysis. It has the advantage that you don’t throw away any information for calibration testing as there is no 50% option. The odds ratio between adjacent options is then ~1.7.

Some might not like the lack of 50% option but I think that’s actually a feature rather than a bug – you’re being asked to at least pick a side, even if you only assign it 1.3:1 odds in favour. Obviously if you genuinely believe 50% then you can’t put your true belief but that’s true of most probabilities whatever groupings you decide on – you sacrifice resolution in order to be able to analyse calibration.