YafahEdelman comments on Catch-Up Algorithmic Progress Might Actually be 60× per Year

YafahEdelman 30 Dec 2025 3:47 UTC
3 points
0
For some models (especially older ones), the Artificial Analysis Intelligence Index score is labeled as “Estimate (independent evaluation forthcoming)”. It is unclear how these scores are determined, and they may not be a reliable estimate. The Artificial Analysis API does not clearly label such estimates and I did not manually remove them for secondary analysis. Ideally the capability levels that have these models (probably typically lower levels) would be weighted less, but I don’t do this due to uncertainty about which models have Estimates vs. independently tested scores.
IMO this is a potentially significant issue that this post should have spent more time addressing, since it means that the earlier sections of the trend lines are coming from a source we know nothing about.
- Aaron_Scher 30 Dec 2025 19:29 UTC
  2 points
  0
  Parent
  I agree it’s potentially a significant issue. One reason I’m relatively less concerned with it is that the AAII scores for these models seem generally pretty reasonable. Another reason is that the results look pretty similar if we only look at more recent models (which by and large have AAII-run benchmarks). E.g., starting July 2024 yields median 1.22 OOMs and weighted 1.85 OOMs.
  There are many places for additional and follow-up work and this is one of them, but I don’t think it invalidates the overall results.