Zach Stein-Perlman comments on Thoughts on sharing information about language model capabilities

Zach Stein-Perlman 31 Jul 2023 22:15 UTC
7 points
4
Good post.
Other points aside, the proposition “LM agents are an unusually safe way to build powerful AI systems” seems really important; it would be great to see more research/intuitions on this + clarification on various flavors of “LM agents.”
- Zach Stein-Perlman 3 Aug 2023 23:42 UTC
  18 points
  6
  Parent
  I guess one crux for sharing research on LM agents is whether there are viable alternative paths to powerful AI systems. If LM-agents is clearly the easiest path, there’s less reason to share research on them; if a less-safe path looks similarly easy, we should differentially advance LM-agents.
  I’m not aware of alternative paths that look anywhere near as easy as LM-agents. Or: I don’t know what viable alternative paths LM-agents are supposed to be safer than. (Edit: some alignment researcher friends mention old-fashioned RL agents as a possible path to powerful AI that’s less safe than LM-agents but say that path looks substantially harder than LM-agents, such that we don’t need to boost LM-agents more.)
  - Zach Stein-Perlman 9 Aug 2023 1:59 UTC
    2 points
    0
    Parent
    Maybe rather than ‘different paths’ Paul just means that capabilities can come from more-powerful-LMs or more-sophisticated-agent-scaffolding. He says:
    at a fixed level of capability, I think the more we are relying on LM agents (rather than larger LMs) the safer we are.
    I buy something like this, at least. But (I weakly intuit) we’ll almost exclusively be relying on LM agents rather than mere next-token-predictors by default; there’s no need to boost LM agents. And even if that’s good, that doesn’t mean that marginal improvements in LM agents’ sophistication/complexity are safer than marginal improvements in underlying-LM-capability. (I don’t have a take on this—just flagging it as a crux.)
    - paulfchristiano 9 Aug 2023 3:59 UTC
      8 points
      3
      Parent
      My guess is that if you hold capability fixed and make a marginal move in the direction of (better LM agents) + (smaller LMs) then you will make the world safer. It straightforwardly decreases the risk of deceptive alignment, makes oversight easier, and decreases the potential advantages of optimizing on outcomes.
      What links here?
      Which possible AI systems are relatively safe? by Zach Stein-Perlman (21 Aug 2023 17:00 UTC; 42 points)
      Which paths to powerful AI should be boosted? by Zach Stein-Perlman (23 Aug 2023 16:00 UTC; 5 points)