Noosphere89 comments on In response to critiques of Guaranteed Safe AI

Noosphere89 1 Feb 2025 16:38 UTC
2 points
0
This seems like a crux here, one that might be useful to uncover further:
2. Claiming that non-vacuous sound (over)approximations are feasible, or that we’ll be able to specify and verify non-trivial safety properties, is risible. Planning for runtime monitoring and anomaly detection is IMO an excellent idea, but would be entirely pointless if you believed that we had a guarantee!
I broadly agree with you that most of the stuff proposed is either in it’s infancy or is essentially vaporware that doesn’t really work without AIs being so good that the plan would be wholly irrelevant, and thus is really unuseful for short timelines work, but I do believe enough of the plan is salvageable to make it not completely useless, and in particular, is the part where it’s very possible for AIs to help in real ways (at least given some evidence):

https://www.lesswrong.com/posts/DZuBHHKao6jsDDreH/in-response-to-critiques-of-guaranteed-safe-ai#Securing_cyberspace
- Zac Hatfield-Dodds 2 Feb 2025 1:55 UTC
  4 points
  −4
  Parent
  Improving the sorry state of software security would be great, and with AI we might even see enough change to the economics of software development and maintenance that it happens, but it’s not really an AI safety agenda.
  
  (added for clarity: of course it can be part of a safety agenda, but see point #1 above)
  - Noosphere89 2 Feb 2025 2:20 UTC
    4 points
    4
    Parent
    I agree that it isn’t a direct AI safety agenda, though I will say that software security would be helpful for control agendas, and the increasing capabilities of AI mathematics could, in principle, help with AI alignment agendas that are mostly mathematical like Vanessa Kosoy’s agenda:
    
    It’s also useful for AI control purposes.
    
    More below:
    
    https://www.lesswrong.com/posts/oJQnRDbgSS8i6DwNu/the-hopium-wars-the-agi-entente-delusion#BSv46tpbkcXCtpXrk
  - Nathan Helm-Burger 2 Feb 2025 3:10 UTC
    3 points
    2
    Parent
    Depends on your assumptions. If you assume that a pretty-well-intent-aligned pretty-well-value-aligned AI (e.g. Claude) scales to a sufficiently powerful tool to gain sufficient leverage on the near-term future to allow you to pause/slow global progress towards ASI (which would kill us all)...
    
    Then having that powerful tool, but having a copy of it stolen from you and used for cross-purposes that prevent you plan from succeeding… Would be snatching defeat from the jaws of victory.
    
    Currently we are perhaps close to creating such a powerful AI tool, maybe even before ‘full AGI’ (by some definition). However, we are nowhere near the top AI labs having good enough security to prevent their code and models from being stolen by a determined state-level adversary.
    
    So in my worldview, computer security is inescapably connected to AI safety.
    - Noosphere89 2 Feb 2025 3:31 UTC
      4 points
      0
      Parent
      
      Depends on your assumptions. If you assume that a pretty-well-intent-aligned pretty-well-value-aligned AI (e.g. Claude) scales to a sufficiently powerful tool to gain sufficient leverage on the near-term future to allow you to pause/slow global progress towards ASI (which would kill us all)...
      
      We can drop the assumption that ASI inevitably kills us all/we should pause and still have the above argument work, or as I like to say it, practical AI alignment/safety is very much helped by computer security, especially against state adversaries.
      
      I think Zach-Stein Perlman is overstating the case, but here it is:
      
      https://www.lesswrong.com/posts/eq2aJt8ZqMaGhBu3r/zach-stein-perlman-s-shortform#ckNQKZf8RxeuZRrGH