I very much appreciate you offering this concrete bet! I probably am not interested in taking this exact proposal and would need to set aside the time to do an investigation into your thread with Ryan and similar to see how close current models are to resolving this, before taking it. I’ll add to my to-do list to look into this and perhaps propose an alternative, if you think that might be useful.
Perhaps it would be simpler and more cruxy to make a bet more directly on downstream tasks we care about, or perhaps metrics of usefulness like revenue? Perhaps looking at capabilities in early 2027, for which I have ~15-20% of having achieved superhuman coder and Daniel has ~25-30%? For example, what the trend on the METR suite will look like or whatever the closest equivalent will be, whether a panel we choose says they think it’s plausible the leading AI company as a superhuman coder internally, what AI companies’ revenue will look like, etc.?
I very much appreciate you offering this concrete bet! I probably am not interested in taking this exact proposal and would need to set aside the time to do an investigation into your thread with Ryan and similar to see how close current models are to resolving this, before taking it. I’ll add to my to-do list to look into this and perhaps propose an alternative, if you think that might be useful.
See also my comments about how what you’re saying is our 0th percentile is not my actual 0th percentile, and how I disagree with you regarding whether the metric you’re proposing underestimates software progress, for the reasons I probably don’t want to take the bet.
Perhaps it would be simpler and more cruxy to make a bet more directly on downstream tasks we care about, or perhaps metrics of usefulness like revenue? Perhaps looking at capabilities in early 2027, for which I have ~15-20% of having achieved superhuman coder and Daniel has ~25-30%? For example, what the trend on the METR suite will look like or whatever the closest equivalent will be, whether a panel we choose says they think it’s plausible the leading AI company as a superhuman coder internally, what AI companies’ revenue will look like, etc.?