It’s in this appendix section as a lower confidence compute estimate and is in the >=45 AAII score bucket. Looking at the data, the reason it is not in the >=50 bucket is that it’s AAII score, pulled from the Artificial Analysis API, is 49.9. I see that they round to 50 on the main webpage. I just used the raw scores from the API without any rounding. Thanks for the check!
it also makes me wonder whether mankind is close to exhausting the algorithmic insights usable in CoT-based models (think of my post with a less credible analysis written in October 2025) and/or mankind has already found a really cheap way to distill models into smaller ones
To be clear about my position, I don’t think the analysis I presented here points at all toward humanity exhausting algorithmic insights. Separate lines of reasoning might lead somebody to that conclusion, but this analysis either has little bearing on the hypothesis or points toward us not running out of insights (on account of the rate of downstream progress being so rapid).
It’s in this appendix section as a lower confidence compute estimate and is in the >=45 AAII score bucket. Looking at the data, the reason it is not in the >=50 bucket is that it’s AAII score, pulled from the Artificial Analysis API, is 49.9. I see that they round to 50 on the main webpage. I just used the raw scores from the API without any rounding. Thanks for the check!
To be clear about my position, I don’t think the analysis I presented here points at all toward humanity exhausting algorithmic insights. Separate lines of reasoning might lead somebody to that conclusion, but this analysis either has little bearing on the hypothesis or points toward us not running out of insights (on account of the rate of downstream progress being so rapid).