This seems like a crux here, one that might be useful to uncover further:
2. Claiming that non-vacuous sound (over)approximations are feasible, or that we’ll be able to specify and verify non-trivial safety properties, is risible. Planning for runtime monitoring and anomaly detection is IMO an excellent idea, but would be entirely pointless if you believed that we had a guarantee!
I broadly agree with you that most of the stuff proposed is either in it’s infancy or is essentially vaporware that doesn’t really work without AIs being so good that the plan would be wholly irrelevant, and thus is really unuseful for short timelines work, but I do believe enough of the plan is salvageable to make it not completely useless, and in particular, is the part where it’s very possible for AIs to help in real ways (at least given some evidence):
Improving the sorry state of software security would be great, and with AI we might even see enough change to the economics of software development and maintenance that it happens, but it’s not really an AI safety agenda.
(added for clarity: of course it can be part of a safety agenda, but see point #1 above)
I agree that it isn’t a direct AI safety agenda, though I will say that software security would be helpful for control agendas, and the increasing capabilities of AI mathematics could, in principle, help with AI alignment agendas that are mostly mathematical like Vanessa Kosoy’s agenda:
Depends on your assumptions. If you assume that a pretty-well-intent-aligned pretty-well-value-aligned AI (e.g. Claude) scales to a sufficiently powerful tool to gain sufficient leverage on the near-term future to allow you to pause/slow global progress towards ASI (which would kill us all)...
Then having that powerful tool, but having a copy of it stolen from you and used for cross-purposes that prevent you plan from succeeding… Would be snatching defeat from the jaws of victory.
Currently we are perhaps close to creating such a powerful AI tool, maybe even before ‘full AGI’ (by some definition). However, we are nowhere near the top AI labs having good enough security to prevent their code and models from being stolen by a determined state-level adversary.
So in my worldview, computer security is inescapably connected to AI safety.
Depends on your assumptions. If you assume that a pretty-well-intent-aligned pretty-well-value-aligned AI (e.g. Claude) scales to a sufficiently powerful tool to gain sufficient leverage on the near-term future to allow you to pause/slow global progress towards ASI (which would kill us all)...
We can drop the assumption that ASI inevitably kills us all/we should pause and still have the above argument work, or as I like to say it, practical AI alignment/safety is very much helped by computer security, especially against state adversaries.
I think Zach-Stein Perlman is overstating the case, but here it is:
This seems like a crux here, one that might be useful to uncover further:
I broadly agree with you that most of the stuff proposed is either in it’s infancy or is essentially vaporware that doesn’t really work without AIs being so good that the plan would be wholly irrelevant, and thus is really unuseful for short timelines work, but I do believe enough of the plan is salvageable to make it not completely useless, and in particular, is the part where it’s very possible for AIs to help in real ways (at least given some evidence):
https://www.lesswrong.com/posts/DZuBHHKao6jsDDreH/in-response-to-critiques-of-guaranteed-safe-ai#Securing_cyberspace
Improving the sorry state of software security would be great, and with AI we might even see enough change to the economics of software development and maintenance that it happens, but it’s not really an AI safety agenda.
(added for clarity: of course it can be part of a safety agenda, but see point #1 above)
I agree that it isn’t a direct AI safety agenda, though I will say that software security would be helpful for control agendas, and the increasing capabilities of AI mathematics could, in principle, help with AI alignment agendas that are mostly mathematical like Vanessa Kosoy’s agenda:
It’s also useful for AI control purposes.
More below:
https://www.lesswrong.com/posts/oJQnRDbgSS8i6DwNu/the-hopium-wars-the-agi-entente-delusion#BSv46tpbkcXCtpXrk
Depends on your assumptions. If you assume that a pretty-well-intent-aligned pretty-well-value-aligned AI (e.g. Claude) scales to a sufficiently powerful tool to gain sufficient leverage on the near-term future to allow you to pause/slow global progress towards ASI (which would kill us all)...
Then having that powerful tool, but having a copy of it stolen from you and used for cross-purposes that prevent you plan from succeeding… Would be snatching defeat from the jaws of victory.
Currently we are perhaps close to creating such a powerful AI tool, maybe even before ‘full AGI’ (by some definition). However, we are nowhere near the top AI labs having good enough security to prevent their code and models from being stolen by a determined state-level adversary.
So in my worldview, computer security is inescapably connected to AI safety.
We can drop the assumption that ASI inevitably kills us all/we should pause and still have the above argument work, or as I like to say it, practical AI alignment/safety is very much helped by computer security, especially against state adversaries.
I think Zach-Stein Perlman is overstating the case, but here it is:
https://www.lesswrong.com/posts/eq2aJt8ZqMaGhBu3r/zach-stein-perlman-s-shortform#ckNQKZf8RxeuZRrGH