Why doesn’t Anthropic publish meaningful data on RSI? Given that they are in the best position to accurately forecast RSI and a positive result is highly beneficial to them as a business (especially with their eye on an IPO), one has to wonder why they haven’t published any rigorous studies.
This leads to the conclusion that:
They simply did not try to collect this data (seems unlikely given how important a positive result could be)
Their studies so far have shown negative or inconclusive results. Or atleast not anything stronger than public data (METR task horizons, etc)
In the recent No Priors podcast episode with Noam Brown, they discuss Noam’s opinions on RSI (starting around 20:59). Noam voices the opinion that he does not believe that we’re moving towards an intelligence explosion anytime soon.
I think there is this hypothesis that you could have basically an overnight intelligence explosion where the models discover some breakthrough to make themselves smarter, and then that leads to more breakthroughs that make themselves even smarter immediately, and you have basically in an instance the models becoming very superhuman across the board in moments. I don’t think we’re headed to that world, largely because the models rely so much on large-scale test-time compute to achieve their greatest intelligence. If it requires so much test-time compute to unlock the full capabilities of the model, then you’re bottlenecked by time — things can only go so fast because the models need to run for long enough to actually do something really powerful.
This is a useful extra datapoint which shows that many leading researchers at the frontier labs don’t actually believe in the intelligence explosion framing of RSI. We are starting to see many examples of researchers from frontier labs voicing this opinion from Google and OpenAI, but notably less so from Anthropic.
Given that Recursive Self Improvement (RSI) is the main short timelines risk model, why don’t we focus more technical governance efforts on targeted ways to prevent this specific risk?
My understanding is that currently, most governance proposals fall into either the maximalist (halt all frontier progress) or minimalist (transparency, liability, etc) camps. However, it seems like the targeted approach of specifically restricting automated AI R&D loops is underexplored. I think that there is far more political will for this kind of thing while addressing the main risk model.
A use restriction (that models aren’t used in a particular way) is much harder to verify than a model or hardware restriction (ensuring certain models don’t exist, or that certain hardware isn’t available in high quantities). And in some modest senses AIs are already part of AI R&D loops, so it’s difficult to litigate what specifically presents an RSI risk, that isn’t being done at AI companies since 2022.
Why doesn’t Anthropic publish meaningful data on RSI? Given that they are in the best position to accurately forecast RSI and a positive result is highly beneficial to them as a business (especially with their eye on an IPO), one has to wonder why they haven’t published any rigorous studies.
This leads to the conclusion that:
They simply did not try to collect this data (seems unlikely given how important a positive result could be)
Their studies so far have shown negative or inconclusive results. Or atleast not anything stronger than public data (METR task horizons, etc)
In the recent No Priors podcast episode with Noam Brown, they discuss Noam’s opinions on RSI (starting around 20:59). Noam voices the opinion that he does not believe that we’re moving towards an intelligence explosion anytime soon.
At 25:58 he says:
This is a useful extra datapoint which shows that many leading researchers at the frontier labs don’t actually believe in the intelligence explosion framing of RSI. We are starting to see many examples of researchers from frontier labs voicing this opinion from Google and OpenAI, but notably less so from Anthropic.
Given that Recursive Self Improvement (RSI) is the main short timelines risk model, why don’t we focus more technical governance efforts on targeted ways to prevent this specific risk?
My understanding is that currently, most governance proposals fall into either the maximalist (halt all frontier progress) or minimalist (transparency, liability, etc) camps. However, it seems like the targeted approach of specifically restricting automated AI R&D loops is underexplored. I think that there is far more political will for this kind of thing while addressing the main risk model.
The difficult part is actually designing this mechanism. I think that a reasonable threshold would be somewhere between the adequacy and the parity point described by Ajeya Cotra (https://www.planned-obsolescence.org/p/six-milestones-for-ai-automation)
A use restriction (that models aren’t used in a particular way) is much harder to verify than a model or hardware restriction (ensuring certain models don’t exist, or that certain hardware isn’t available in high quantities). And in some modest senses AIs are already part of AI R&D loops, so it’s difficult to litigate what specifically presents an RSI risk, that isn’t being done at AI companies since 2022.