Charbel-Raphaël comments on New AI safety treaty paper out!

Charbel-Raphaël 30 Mar 2025 23:58 UTC
2 points
0
Although many AI alignment projects seem to rely on offense/defense balance favoring defense
Why do you think this is the case?
- otto.barten 31 Mar 2025 17:22 UTC
  3 points
  1
  Parent
  Hi Charbel, thanks for your interest, great question.
  If the balance would favor offense, we would die anyway despite a successful alignment project, since there’s always either a bad actor or someone accidentally failing to align their takeover-level AI, in a world with many AGIs. (I tend to think about this as Murphy’s law for AGI). Therefore, if one claims that one’s alignment project reduces existential risk, they must think their aligned AI can somehow stop another unaligned AI (favorable offense/defense balance).
  There are some other options:
  - Some believe the first AGI will take off to ASI straight away and will block other projects by default. I think that’s at least not certain, e.g. the labs don’t seem to believe so. Note also that blocking is illegal.
  - Some believe the first AGI will take off to pivotal act capability and do a pivotal act. I think there’s at least a chance that won’t happen. Note also that pivotal acts are illegal.
  - It could be that we regulate AI so that no unsafe projects can be built, using eg a conditional AI safety treaty. In this case, neither alignment, nor a positive offense defense balance are needed.
  - It could be that we get MAIM, mutually assured AI malfunction. In this case too, neither alignment nor a positive offense defense balance are needed.
  Barring these options though, we seem to not only need AI alignment, bit also a positive offense defense balance.
  Some more on the topic: https://www.lesswrong.com/posts/2cxNvPtMrjwaJrtoR/ai-regulation-may-be-more-important-than-ai-alignment-for