p.b. comments on Introducing the Epoch Capabilities Index (ECI)

p.b. 30 Oct 2025 12:53 UTC
3 points
0
What is uniquely interesting/valuable about METR time horizons is that the score is meaningful and interpretable. Can do software tasks that would take an expert 2h with 50% success probability is very specific. Has the score y on benchmark x is only valuable for comparisons, it does not tell you what’s going to happen when the models reach score z.