I feel like looking at unreleased models for doubling time mucks things up a bit. For instance I’m assuming the unreleased o3 model from December had a significantly longer time-horizon in math than the released o3, given its much higher benchmarks in FrontierMath, etc.
I feel like looking at unreleased models for doubling time mucks things up a bit. For instance I’m assuming the unreleased o3 model from December had a significantly longer time-horizon in math than the released o3, given its much higher benchmarks in FrontierMath, etc.
Can you be more specific about what you think the issue is?