But I’ve been increasingly starting to wonder if software engineering might not be surprisingly easy to automate when the right data/environments are used at much larger scale, e.g. Github issues (see e.g. D3: A Large Dataset for Training Code Language Models to Act Diff-by-Diff) or semi-automated pipelines to build SWE RL environments (see e.g. Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs), which seem potentially surprisingly easy to automatically scale up. It now seems much more plausible to me that this could be a scaling data problem than a scaling compute problem, and that progress might be fast. Also, it seems likely that there might be some flywheel effect of better AIs → better automated collection + filtering of SWE environments/data → better AIs, etc. And ‘Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs’ has already shown data scaling laws:
But I’ve been increasingly starting to wonder if software engineering might not be surprisingly easy to automate when the right data/environments are used at much larger scale
I’ve had similar thoughts: I think there’s still low-hanging fruit in RL, and in scaffolding and further scaling of inference compute. But my general take is that the recent faster trend of doubling every ~4 months is already the result of picking the low-hanging RL fruit for coding and SWE, and fast inference scaling. So this kind of thing will probably lead to a continuation of the fast trend, not another acceleration.
Another source of shorter timelines, depending on what timeline you mean, is the uncertainty from translating time horizon to real-world AI research productivity. Maybe models with an 80% time horizon of 1 month or less are already enough for a huge acceleration of AI R&D, with the right scaffold/unhobbling/bureaucracy that can take advantage of lots of parallel small experiments or other work, or good complementarities between AI and human labor,
I think I agree directionally with the post.
But I’ve been increasingly starting to wonder if software engineering might not be surprisingly easy to automate when the right data/environments are used at much larger scale, e.g. Github issues (see e.g. D3: A Large Dataset for Training Code Language Models to Act Diff-by-Diff) or semi-automated pipelines to build SWE RL environments (see e.g. Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs), which seem potentially surprisingly easy to automatically scale up. It now seems much more plausible to me that this could be a scaling data problem than a scaling compute problem, and that progress might be fast. Also, it seems likely that there might be some flywheel effect of better AIs → better automated collection + filtering of SWE environments/data → better AIs, etc. And ‘Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs’ has already shown data scaling laws:
Also, my impression is that SWE is probably the biggest bottleneck in automating AI R&D, based on results like those in Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
and especially based on the length of the time horizons involved in the SWE part vs. other parts of the AI R&D cycle.
I’ve had similar thoughts: I think there’s still low-hanging fruit in RL, and in scaffolding and further scaling of inference compute. But my general take is that the recent faster trend of doubling every ~4 months is already the result of picking the low-hanging RL fruit for coding and SWE, and fast inference scaling. So this kind of thing will probably lead to a continuation of the fast trend, not another acceleration.
Another source of shorter timelines, depending on what timeline you mean, is the uncertainty from translating time horizon to real-world AI research productivity. Maybe models with an 80% time horizon of 1 month or less are already enough for a huge acceleration of AI R&D, with the right scaffold/unhobbling/bureaucracy that can take advantage of lots of parallel small experiments or other work, or good complementarities between AI and human labor,