Well, today GPT-5-Codex solved it on the 2nd try. (The first version it gave was already conceptually correct, but I guess had some subtle bug. After I told it to fix it and test the fix, it gave a working solution.)
I am just surprised how well the agentic loop is working. It cloned the specific Lean version’s source code I was asking for, inspected it to understand the data structure, downloaded a release tarball to test it, all without losing track of its goals. All this would have been unimaginable ~a year ago.
So yeah, in 7 months (but maybe even 2 if you count the base GPT-5 attempt) we went from “not even close” to “solved” on this problem. Not sure how I should feel about this...
Well, today GPT-5-Codex solved it on the 2nd try. (The first version it gave was already conceptually correct, but I guess had some subtle bug. After I told it to fix it and test the fix, it gave a working solution.)
I am just surprised how well the agentic loop is working. It cloned the specific Lean version’s source code I was asking for, inspected it to understand the data structure, downloaded a release tarball to test it, all without losing track of its goals. All this would have been unimaginable ~a year ago.
So yeah, in 7 months (but maybe even 2 if you count the base GPT-5 attempt) we went from “not even close” to “solved” on this problem. Not sure how I should feel about this...