paulfchristiano comments on Thoughts on sharing information about language model capabilities

paulfchristiano 9 Aug 2023 3:59 UTC
7 points
0
My guess is that if you hold capability fixed and make a marginal move in the direction of (better LM agents) + (smaller LMs) then you will make the world safer. It straightforwardly decreases the risk of deceptive alignment, makes oversight easier, and decreases the potential advantages of optimizing on outcomes.
What links here?
- Which possible AI systems are relatively safe? by Zach Stein-Perlman (21 Aug 2023 17:00 UTC; 42 points)
- Which paths to powerful AI should be boosted? by Zach Stein-Perlman (23 Aug 2023 16:00 UTC; 1 point)