Thomas Kwa comments on Thomas Kwa’s Shortform

Thomas Kwa 24 Jul 2023 21:09 UTC
5 points
1
People say it’s important to demonstrate alignment problems like goal misgeneralization. But now, OpenAI, Deepmind, and Anthropic have all had leaders sign the CAIS statement on extinction risk and are doing substantial alignment research. The gap between the 90th percentile alignment concerned people at labs and the MIRI worldview is now more about security mindset. Security mindset is present in cybersecurity because it is useful in the everyday, practical environment researchers work in. So perhaps a large part of the future hinges on whether security mindset becomes useful for solving problems in applied ML.
- Will it become clear that the presence of one bug in an AI system implies that there are probably five more?
- Will architectures that we understand with fewer moving parts be demonstrated to have better robustness than black-box systems or systems that work for complicated reasons?
- Will it become tractable to develop these kinds of simple transparent systems so that security mindset can catch on?
- Ben Pace 24 Jul 2023 21:30 UTC
  2 points
  0
  Parent
  the CAIS letter
  I suggest calling it “the sentence on extinction risk” so that people can pick up what is meant without having to have already memorized an acronym.
  - Thomas Kwa 24 Jul 2023 21:31 UTC
    2 points
    0
    Parent
    Edited, thanks