Why doesn’t Anthropic publish meaningful data on RSI? Given that they are in the best position to accurately forecast RSI and a positive result is highly beneficial to them as a business (especially with their eye on an IPO), one has to wonder why they haven’t published any rigorous studies.
This leads to the conclusion that:
They simply did not try to collect this data (seems unlikely given how important a positive result could be)
Their studies so far have shown negative or inconclusive results. Or atleast not anything stronger than public data (METR task horizons, etc)
Given that Recursive Self Improvement (RSI) is the main short timelines risk model, why don’t we focus more technical governance efforts on targeted ways to prevent this specific risk?
My understanding is that currently, most governance proposals fall into either the maximalist (halt all frontier progress) or minimalist (transparency, liability, etc) camps. However, it seems like the targeted approach of specifically restricting automated AI R&D loops is underexplored. I think that there is far more political will for this kind of thing while addressing the main risk model.
A use restriction (that models aren’t used in a particular way) is much harder to verify than a model or hardware restriction (ensuring certain models don’t exist, or that certain hardware isn’t available in high quantities). And in some modest senses AIs are already part of AI R&D loops, so it’s difficult to litigate what specifically presents an RSI risk, that isn’t being done at AI companies since 2022.
Why doesn’t Anthropic publish meaningful data on RSI? Given that they are in the best position to accurately forecast RSI and a positive result is highly beneficial to them as a business (especially with their eye on an IPO), one has to wonder why they haven’t published any rigorous studies.
This leads to the conclusion that:
They simply did not try to collect this data (seems unlikely given how important a positive result could be)
Their studies so far have shown negative or inconclusive results. Or atleast not anything stronger than public data (METR task horizons, etc)
Given that Recursive Self Improvement (RSI) is the main short timelines risk model, why don’t we focus more technical governance efforts on targeted ways to prevent this specific risk?
My understanding is that currently, most governance proposals fall into either the maximalist (halt all frontier progress) or minimalist (transparency, liability, etc) camps. However, it seems like the targeted approach of specifically restricting automated AI R&D loops is underexplored. I think that there is far more political will for this kind of thing while addressing the main risk model.
The difficult part is actually designing this mechanism. I think that a reasonable threshold would be somewhere between the adequacy and the parity point described by Ajeya Cotra (https://www.planned-obsolescence.org/p/six-milestones-for-ai-automation)
A use restriction (that models aren’t used in a particular way) is much harder to verify than a model or hardware restriction (ensuring certain models don’t exist, or that certain hardware isn’t available in high quantities). And in some modest senses AIs are already part of AI R&D loops, so it’s difficult to litigate what specifically presents an RSI risk, that isn’t being done at AI companies since 2022.