Daniel Kokotajlo comments on Daniel Kokotajlo’s Shortform

Daniel Kokotajlo 1 Jan 2026 0:20 UTC

7 points

Thank you! I continued your ChatGPT conversation and got the following tidbits of extra info:

For my 2020 predictions, I asked it to summarize my biggest sources of Brier loss:

Top 5 Brier losses from FALSE outcomes (overconfidence)

Rank	Claim	p (2020)	Loss
1	100+ drone swarms operationally fielded	0.50	0.250
2	10%+ news articles AI-generated	0.20	0.040
3	$1 deepfake costs $100+ to detect	0.35	0.123
4	Orbital rocket reused 100 times	0.30	0.090
5	90%+ Rio Tinto mines automated	0.30	0.090

Top 5 Brier losses from TRUE outcomes (pessimism)

Rank	Claim	p (2020)	Loss
1	AI wins International Math Olympiad	0.15	0.723
2	Neural net writes B-average college essays	0.20	0.640
3	Crypto restaurants near Berkeley	0.20	0.640
4	Combat drones <50 lbs fielded	0.50	0.250
5	AI beats top StarCraft players	0.70	0.090

The IMO prediction alone is your single largest loss across all 2020 items.

Interestingly the two biggest hits to Brier score I took (on jan 1 2020) were from underestimating LLMs basically.

Also interestingly, it seems like maybe I took significantly more Brier loss from saying things wouldn’t happen, that did happen, then from saying things would happen, that didn’t? In other words I was too cautious overall rather than too credulous?

Daniel Kokotajlo 1 Jan 2026 0:34 UTC
6 points
0
Parent
ChatGPT goes on to say that I outperformed Rick (who also made predictions in 2020 in the spreadsheet.) However, looking over the data briefly, I’m not sure I agree with some of the scores, e.g. are there really robotaxis in 20+ cities now? And drone delivery?
- Lukas Finnveden 1 Jan 2026 1:26 UTC
  2 points
  0
  Parent
  Yep, resolutions not very reliable.
  The drone delivery one was claude claiming:
  Kiwibot has operated delivery robots in Berkeley since 2017, founded in UC Berkeley’s Skydeck incubator. Delivers food within approximately one mile of campus with over 250,000 total deliveries completed.
  Googling quickly, there are claims that it has since shut down and also that it was remote-controlled rather than fully autonomous. In any case, it’d be pretty niche and clearly only available due to the novelty value.
  Robotaxis in 20+ cities was something claude initially thought false and then gpt-5.1 thought it was “borderline true” based on a bunch of baidu deployments. E.g. source. No idea whether that holds up, idk the robotaxi situation in china. (Also that news is slightly after september 22.)
  I also think the starcraft one is probably wrong. Looking now, the models seem to be mainly leaning on 2019 cites, which I think weren’t sufficient to show AI consistently beating humans.