Adrià Garriga-alonso comments on Alignment will happen by default. What’s next?

Adrià Garriga-alonso 6 Dec 2025 9:10 UTC
LW: 2 AF: 1
0
AF
No, I think the blue-team will keep having the latest and best LLMs and be able to stop such attempts from randos. These AGIs won’t be so much magically superintelligent that they can take all the unethical actions needed to take over the world, without other AGIs stopping them.
- otto.barten 7 Dec 2025 10:14 UTC
  1 point
  0
  Parent
  I don’t think it makes sense to be confidently optimistic about this (the offense defense balance) given the current state of research. I looked into this topic some time ago with Sammy Martin. I think there is very little plan of anyone in the research community on how the blue team would actually stop the red team. Particularly worrying is that several domains look like the offense has the advantage (eg bioweapons, cybersec), and that defense would need to play by the rules, hugely hindering its ability to act. See also eg this post.
  Since most people who actually thought about this seem to arrive at the conclusion that offense would win, I think being confident that defense would win seems off. What are your arguments?