Thanks for the histograms. Is the raw data available somewhere?
Just eyeballing it:
accuracy growth rate for Opus 4.5 in the 4-8 hour range is what is expected given the trendline from Sonnet to Sonnet 4.5.
The 8-16 hour growth came in ~3 months ahead of target.
The 2-4 hour growth is a month or so ahead of target.
Aligns to my sense the model is a month, maybe 2 months, ahead of what is expected and a lot of this jump (4.5 months ahead of expected) is from artifacts of the curve fitting
Thanks for the histograms. Is the raw data available somewhere?
Just eyeballing it:
accuracy growth rate for Opus 4.5 in the 4-8 hour range is what is expected given the trendline from Sonnet to Sonnet 4.5.
The 8-16 hour growth came in ~3 months ahead of target.
The 2-4 hour growth is a month or so ahead of target.
Aligns to my sense the model is a month, maybe 2 months, ahead of what is expected and a lot of this jump (4.5 months ahead of expected) is from artifacts of the curve fitting