Rate of improvement also varies significantly; math contests have improved ~50x in the last year but Tesla self-driving only 6x in 3 years.
I wish I had thought to blind myself to these results and try to predict them in advance. I think I would have predicted that Tesla self-driving would be the slowest and that aime would be the fastest. Not confident though.
(Solving difficult math problems is just about the easiest long-horizon task to train for,* and in the last few months we’ve seen OpenAI especially put a lot of effort into training this.)
*Only tokens, no images. Also no need for tools/plugins to the internet or some code or game environment. Also you have ground-truth access to the answers, it’s impossible to reward hack.
I think I would have predicted that Tesla self-driving would be the slowest
For graphs like these, it obviously isn’t important how the worst or mediocre competitors are doing, but the best one. It doesn’t matter who’s #5. Tesla self-driving is a longstanding, notorious failure. (And apparently is continuing to be a failure, as they continue to walk back the much-touted Cybertaxi launch, which keeps shrinking like a snowman in hell, now down to a few invited users in a heavily-mapped area with teleop.)
I’d be much more interested in Waymo numbers, as that is closer to SOTA, and they have been ramping up miles & cities.
I would love to have Waymo data. It looks like it’s only available since September 2024 so I’ll still need to use Tesla for the earlier period. More critically they don’t publish disengagement data, only crash/injury. There are Waymo claims of things like 1 disengagement every 17,000 miles but I don’t believe them without a precise definition for what this number represents.
I wish I had thought to blind myself to these results and try to predict them in advance. I think I would have predicted that Tesla self-driving would be the slowest and that aime would be the fastest. Not confident though.
(Solving difficult math problems is just about the easiest long-horizon task to train for,* and in the last few months we’ve seen OpenAI especially put a lot of effort into training this.)
*Only tokens, no images. Also no need for tools/plugins to the internet or some code or game environment. Also you have ground-truth access to the answers, it’s impossible to reward hack.
For graphs like these, it obviously isn’t important how the worst or mediocre competitors are doing, but the best one. It doesn’t matter who’s #5. Tesla self-driving is a longstanding, notorious failure. (And apparently is continuing to be a failure, as they continue to walk back the much-touted Cybertaxi launch, which keeps shrinking like a snowman in hell, now down to a few invited users in a heavily-mapped area with teleop.)
I’d be much more interested in Waymo numbers, as that is closer to SOTA, and they have been ramping up miles & cities.
I would love to have Waymo data. It looks like it’s only available since September 2024 so I’ll still need to use Tesla for the earlier period. More critically they don’t publish disengagement data, only crash/injury. There are Waymo claims of things like 1 disengagement every 17,000 miles but I don’t believe them without a precise definition for what this number represents.