I predict yes, in the following narrow sense: I think a system backed by GPT-5 and granted access to the right APIs will be capable of fully automatically making changes to a repository of that system’s code, and then deploy those changes to instantiate a new instance of itself.
I think GPT-4 is slightly too weak to do this, though I think it’s not out of the question that people eventually get it working for somewhat trivial / restrictive cases.
GitHub is currently working on or already testing things like Copilot for Pull Requests and Copilot for your codebase. It’s not that much of a stretch to imagine hooking these together into a fully automated pull-request authoring system.
An even harder task is then pointing such a system at the codebase(s) and infrastructure used to train the underlying transformer model, make improvements, and then kick off a new training run. I think GPT-5 and maybe even GPT-4 (suitably connected) could make trivial improvements and minor bugfixes to such a repo. Fully autonomously supervising a multi-million dollar training run might be a stretch, but I’m definitely not confident that it is ruled out.
Both you and Peter have pointed out that one of the cruxes here is how much compute is needed for testing.
I agree that if the process could come up with algorithmic improvements so weak and subtle that the advantage could only be clearly distinguished at the scale of a full multimillion dollar training run, then RSI would likely not take off.
I expect though that the process I describe would find strong improvements, which would be obvious at a 100k param run, and continue showing clear advantage at 1 million, 10 million, 100 million, 1 billion, 10 billion, etc.
In that case, the extrapolation becomes a safe bet, and the compute needed for parallel testing is much lower since you only need to test the small models to figure out what is worth scaling.
I predict yes, in the following narrow sense: I think a system backed by GPT-5 and granted access to the right APIs will be capable of fully automatically making changes to a repository of that system’s code, and then deploy those changes to instantiate a new instance of itself.
I think GPT-4 is slightly too weak to do this, though I think it’s not out of the question that people eventually get it working for somewhat trivial / restrictive cases.
GitHub is currently working on or already testing things like Copilot for Pull Requests and Copilot for your codebase. It’s not that much of a stretch to imagine hooking these together into a fully automated pull-request authoring system.
An even harder task is then pointing such a system at the codebase(s) and infrastructure used to train the underlying transformer model, make improvements, and then kick off a new training run. I think GPT-5 and maybe even GPT-4 (suitably connected) could make trivial improvements and minor bugfixes to such a repo. Fully autonomously supervising a multi-million dollar training run might be a stretch, but I’m definitely not confident that it is ruled out.
Both you and Peter have pointed out that one of the cruxes here is how much compute is needed for testing. I agree that if the process could come up with algorithmic improvements so weak and subtle that the advantage could only be clearly distinguished at the scale of a full multimillion dollar training run, then RSI would likely not take off. I expect though that the process I describe would find strong improvements, which would be obvious at a 100k param run, and continue showing clear advantage at 1 million, 10 million, 100 million, 1 billion, 10 billion, etc. In that case, the extrapolation becomes a safe bet, and the compute needed for parallel testing is much lower since you only need to test the small models to figure out what is worth scaling.