Roman Malov comments on Is ProgramBench Impossible?

Roman Malov 8 May 2026 20:02 UTC
28 points
14
Maybe I’m not looking in the right place, but the obvious question to this benchmark—how do humans fair in it? If humans score 0 too, then models scoring 0 is not a huge signal (even though authors claim that this bench is supposed to be closer to work of a real engineer).