IMO challenge bet with Eliezer

Eliezer and I publicly stated some predictions about AI performance on the IMO by 2025. In honor of OpenAI’s post Solving (Some) Formal Math Problems, it seems good to publicly state and clarify our predictions, have a final chance to adjust them, and say a bit in advance about how we’d update.

The predictions

Eliezer and I had an exchange in November 2021.[1] My final prediction (after significantly revising my guesses after looking up IMO questions and medal thresholds) was:

I’d put 4% on “For the 2022, 2023, 2024, or 2025 IMO an AI built before the IMO is able to solve the single hardest problem” where “hardest problem” = “usually problem #6, but use problem #3 instead if either: (i) problem 6 is geo or (ii) problem 3 is combinatorics and problem 6 is algebra.” (Would prefer just pick the hardest problem after seeing the test but seems better to commit to a procedure.)

Maybe I’ll go 8% on “gets gold” instead of “solves hardest problem.”

Eliezer spent less time revising his prediction, but said (earlier in the discussion):

My probability is at least 16% [on the IMO grand challenge falling], though I’d have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more. Paul?

EDIT: I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists. I’ll stand by a >16% probability of the technical capability existing by end of 2025

So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.

Separately, we have Paul at <4% of an AI able to solve the “hardest” problem under the same conditions.

I don’t plan to revise my predictions further, but I’d be happy if Eliezer wants to do so any time over the next few weeks.

Earlier in the thread I clarified that my predictions are specifically about gold medals (and become even sharper as we move to harder problems), I am not surprised by silver or bronze. My guess would be that Eliezer has a more broad distribution. The comments would be a good place for Eliezer to state other predictions, or take a final chance to revise the main prediction.

How I’d update

The informative:

  • I think the IMO challenge would be significant direct evidence that powerful AI would be sooner, or at least would be technologically possible sooner. I think this would be fairly significant evidence, perhaps pushing my 2040 TAI probability up from 25% to 40% or something like that.

  • I think this would be significant evidence that takeoff will be limited by sociological facts and engineering effort rather than a slow march of smooth ML scaling. Maybe I’d move from a 30% chance of hard takeoff to a 50% chance of hard takeoff.

  • If Eliezer wins, he gets 1 bit of epistemic credit.[2][3] These kinds of updates are slow going, and it would be better if we had a bigger portfolio of bets, but I’ll take what we can get.

  • This would be some update for Eliezer’s view that “the future is hard to predict.” I think we have clear enough pictures of the future that we have the right to be surprised by an IMO challenge win; if I’m wrong about that then it’s general evidence my error bars are too narrow.

The uninformative:

  • This is mostly just a brute test of a particular intuition I have about a field I haven’t ever worked in. It’s still interesting (see above), but it doesn’t bear that much on deep facts about intelligence (my sense is that Eliezer and I are optimistic about similar methods for theorem proving), or heuristics about trend extrapolation (since we have ~no trend to extrapolate), or on progress being continuous in crowded areas (since theorem proving investment has historically been low), or on lots of pre-singularity investment in economically important areas (since theorem proving is relatively low-impact). I think there are lots of other questions that do bear on these things, but we weren’t able to pick out a disagreement on any of them.

If an AI wins a gold on some but not all of those years, without being able to solve the hardest problems, then my update will be somewhat more limited but in the same direction. If an AI wins a bronze/​silver medal, I’m not making any of these updates and don’t think Eliezer gets any credit unless he wants to stake some predictions on those lower bars (I consider them much more likely, maybe 20% for “bronze or silver” vs 8% on “gold,” but that’s less well-considered than the bets above, but I haven’t thought about that at all).

  1. ^

    We also looked for claims that Eliezer thought were very unlikely, so that he’d also have an opportunity to make some extremely surprising predictions. But we weren’t able to find any clean disagreements that would resolve before the end of days.

  2. ^

    I previously added the text: “So e.g. if Eliezer and I used to get equal weight in a mixture of experts, now Eliezer should get 2x my weight. Conversely, if I win then I should get 1.1x his weight.” But I think that really depends on how you want to assign weights. That’s a very natural algorithm that I endorse generally, but given that neither of us really has thought carefully about this question it would be reasonable to just not update much one way or the other.

  3. ^

    More if he chooses to revise his prediction up from 16%, or if he wants to make a bet about the “hardest problem” claim where I’m at 4%.