Good points! Obvious now that I think about it that having models ‘submit’ at the end of an agentic code editing eval by submitting a PR via a fake git tool would be vastly more realistic.
Good points! Obvious now that I think about it that having models ‘submit’ at the end of an agentic code editing eval by submitting a PR via a fake git tool would be vastly more realistic.