I think when there is so much extreme uncertainty in what is going to happen, it is both wrong to put all your eggs in one basket, and to put nothing in one basket. AI control might be useful.
How Intelligent?
It is extremely uncertain what level of intelligence is needed to escape the best AI control ideas.
Escaping a truly competent facility which can read/edit your thoughts, or finding a solution to permanently prevent other ASI from taking over the world, are both very hard tasks. It is possible that the former is harder than the latter.
AI aligning AI isn’t certainly worthless
In order to argue for alignment research rather than control research, you have to assume human alignment research has nonzero value. Given they have nonzero value to begin with, it’s hard to argue that they will gain exactly zero additional value from transformative AI which can think much faster than them, and by definition aren’t stupid. Why will the people smart enough to solve alignment if made to work on their own, be stupid enough to fall for AI slop from the transformative AI?
Admittedly, if we have only a month between transformative AI and ASI, maybe the value is negligible. But the duration is highly uncertain (since the AI lab may be shocked into pausing development). The total alignment work by transformative AI could be far less than the work done by humans, or it could be far more.
Convince the world
Also I really like Alex Mallen’s comment. An controlled but misaligned AGI may convince the world to finally get its act together.
Conclusion
I fully agree that AI control is less useful than AI alignment. But I disagree that AI control is less cost effective than AI alignment. Massive progress in AI alignment feels more valuable than massive progress in AI control, but it also feels more out of reach.[1]
If the field was currently spending 67% on control and only 33% on other things, I would totally agree with reducing control. But right now I wouldn’t.
I think when there is so much extreme uncertainty in what is going to happen, it is both wrong to put all your eggs in one basket, and to put nothing in one basket. AI control might be useful.
How Intelligent?
It is extremely uncertain what level of intelligence is needed to escape the best AI control ideas.
Escaping a truly competent facility which can read/edit your thoughts, or finding a solution to permanently prevent other ASI from taking over the world, are both very hard tasks. It is possible that the former is harder than the latter.
AI aligning AI isn’t certainly worthless
In order to argue for alignment research rather than control research, you have to assume human alignment research has nonzero value. Given they have nonzero value to begin with, it’s hard to argue that they will gain exactly zero additional value from transformative AI which can think much faster than them, and by definition aren’t stupid. Why will the people smart enough to solve alignment if made to work on their own, be stupid enough to fall for AI slop from the transformative AI?
Admittedly, if we have only a month between transformative AI and ASI, maybe the value is negligible. But the duration is highly uncertain (since the AI lab may be shocked into pausing development). The total alignment work by transformative AI could be far less than the work done by humans, or it could be far more.
Convince the world
Also I really like Alex Mallen’s comment. An controlled but misaligned AGI may convince the world to finally get its act together.
Conclusion
I fully agree that AI control is less useful than AI alignment. But I disagree that AI control is less cost effective than AI alignment. Massive progress in AI alignment feels more valuable than massive progress in AI control, but it also feels more out of reach.[1]
If the field was currently spending 67% on control and only 33% on other things, I would totally agree with reducing control. But right now I wouldn’t.
It resembles what you call streetlighting, but actually has a valuable chance of working.