Have you seen MirrorCode? The big-picture idea is similar to ProgramBench, but we make more effort to ensure the tasks are fair / actually possible. We find AI can solve most of them.
https://epoch.ai/blog/mirrorcode-preliminary-results
From what i can tell, mirrorcode is much better. Im excited for it to be fully released.
Have you seen MirrorCode? The big-picture idea is similar to ProgramBench, but we make more effort to ensure the tasks are fair / actually possible. We find AI can solve most of them.
https://epoch.ai/blog/mirrorcode-preliminary-results
From what i can tell, mirrorcode is much better. Im excited for it to be fully released.