I have an argument for capabilities research being good but with different assumptions. The assumption that’s different is that we would progress rapidly towards AGI capabilities (say, in 10 years).
If we agree 95% of progress towards alignment happens very close to the AGI, then the duration of the interval between almost-AGI and AGI is the most important duration.
Suppose the ratio of capabilities research to alignment research is low (probably what most people here want). Then AI researchers and deployers will have an option say “Look, so many resources were put towards safety already, it’s actually fine, we’re employing the 2027 comprehensive robustness benchmarks, and IDA+, in fact our quality assurance team is implementing it right now, no need to worry”, prompting decision-makers to relax and let it go. Almost-AGI → AGI interval is 2 years.
On the other hand, if it’s high, this may cause decision-makers to freak out when they have their almost-AGI on the table and contain the development (e.g. with regulation). This may primarily be mediated via easier-to-avoid public failures and accidents. Or by AI safety people quickly and loudly demonstrating that we don’t yet have the tools to avoid even these easier-to-avoid failures. Then regulation extends the Almost-AGI → AGI interval to 8 years.
The point is that this is 4x more time to work on 95% of safety research progress.
I have an argument for capabilities research being good but with different assumptions. The assumption that’s different is that we would progress rapidly towards AGI capabilities (say, in 10 years).
If we agree 95% of progress towards alignment happens very close to the AGI, then the duration of the interval between almost-AGI and AGI is the most important duration.
Suppose the ratio of capabilities research to alignment research is low (probably what most people here want). Then AI researchers and deployers will have an option say “Look, so many resources were put towards safety already, it’s actually fine, we’re employing the 2027 comprehensive robustness benchmarks, and IDA+, in fact our quality assurance team is implementing it right now, no need to worry”, prompting decision-makers to relax and let it go. Almost-AGI → AGI interval is 2 years.
On the other hand, if it’s high, this may cause decision-makers to freak out when they have their almost-AGI on the table and contain the development (e.g. with regulation). This may primarily be mediated via easier-to-avoid public failures and accidents. Or by AI safety people quickly and loudly demonstrating that we don’t yet have the tools to avoid even these easier-to-avoid failures. Then regulation extends the Almost-AGI → AGI interval to 8 years.
The point is that this is 4x more time to work on 95% of safety research progress.