Hmm, yeah I think you’re right, though I also don’t think I articulated what I was trying to say very well.
Like I think my view is:
There was some story where we would see very fast progress in relatively easy to verify (or trivial to verify) tasks and I’m talking about that. It seems like agentic software engineering could reach very high levels without necessarily needing serious improvements in harder to verify tasks.
Faster progress in non-trivial-to-verify tasks might not be the limiting factor if progress in easy to verify tasks isn’t that fast.
I still think that there won’t be a noticable jump as the IMO methods make it into production models but this is due to more general heuristics (and the methods maybe still matter, it just won’t be something to wait for I think).
Hmm, yeah I think you’re right, though I also don’t think I articulated what I was trying to say very well.
Like I think my view is:
There was some story where we would see very fast progress in relatively easy to verify (or trivial to verify) tasks and I’m talking about that. It seems like agentic software engineering could reach very high levels without necessarily needing serious improvements in harder to verify tasks.
Faster progress in non-trivial-to-verify tasks might not be the limiting factor if progress in easy to verify tasks isn’t that fast.
I still think that there won’t be a noticable jump as the IMO methods make it into production models but this is due to more general heuristics (and the methods maybe still matter, it just won’t be something to wait for I think).