OpenAI is competing in the AtCoder world tour finals (heuristic division) with a new model/agent. It is a 10-hour competition with an optimization-based problem, and OpenAI’s model is currently at 2nd place.
So it really is 10 hours on 1 problem (!) but with automated scoring and multiple submissions allowed. This is better performance that I would have expected but it seems like the lower agency end of SWE tasks and I expect it does not imply 10 hours task lengths are in reach.
OpenAI sponsors the event which is… a little suspicious.
I was talking with one of my friends about this, who was (is?) a quite successful competitive coder. He mentioned that the structure of the competition (a heuristic competition) tends to favor a lot of quick prototyping and iteration, much more than other types of programming competitions. Which would tend to play to AI’s current strengths more. Though the longer horizon is impressive (OpenAI’s solution regained the lead 8 hours in, I think? So it was making meaningful contributions even hours in).
OpenAI is competing in the AtCoder world tour finals (heuristic division) with a new model/agent. It is a 10-hour competition with an optimization-based problem, and OpenAI’s model is currently at 2nd place.
Edit: here are the rules https://atcoder.jp/contests/awtf2025heuristic#:~:text=Contest%20Rules&text=You%20can%20use%20any%20programming,the%20end%20of%20the%20contest.
So it really is 10 hours on 1 problem (!) but with automated scoring and multiple submissions allowed. This is better performance that I would have expected but it seems like the lower agency end of SWE tasks and I expect it does not imply 10 hours task lengths are in reach.
OpenAI sponsors the event which is… a little suspicious.
Probably they want the data.
I was talking with one of my friends about this, who was (is?) a quite successful competitive coder. He mentioned that the structure of the competition (a heuristic competition) tends to favor a lot of quick prototyping and iteration, much more than other types of programming competitions. Which would tend to play to AI’s current strengths more. Though the longer horizon is impressive (OpenAI’s solution regained the lead 8 hours in, I think? So it was making meaningful contributions even hours in).