Bronson Schoen comments on faul_sname’s Shortform

Bronson Schoen 3 Oct 2025 17:42 UTC
3 points
0
My best guess is this is because right now in training they never have to maintain code they wrote, I imagine there will be a period where they code becomes very clean whenever they are incentivized by having to work over their own code over longer time horizons, followed by ??? as they optimize for “whatever design patterns are optimal for a multi-agent system collaborating on some code”
- faul_sname 3 Oct 2025 20:40 UTC
  2 points
  0
  Parent
  I expect it’ll actually be solved a bit before that, because minimally-scaffolded LLMs can already give pretty good code review feedback that catches a lot of these issues, and so already-existing RLAIF techniques should work fine. The training pipelines would be finicky to set up but would not require any new technical advances, just schlep, so I predict it’ll happen as soon as writing good code becomes more of a competitive advantage than benchmaxxing (which seems to be happening already, SWE-bench-verified is rapidly saturating).