Maybe rather than ‘different paths’ Paul just means that capabilities can come from more-powerful-LMs or more-sophisticated-agent-scaffolding. He says:
at a fixed level of capability, I think the more we are relying on LM agents (rather than larger LMs) the safer we are.
I buy something like this, at least. But (I weakly intuit) we’ll almost exclusively be relying on LM agents rather than mere next-token-predictors by default; there’s no need to boost LM agents. And even if that’s good, that doesn’t mean that marginal improvements in LM agents’ sophistication/complexity are safer than marginal improvements in underlying-LM-capability. (I don’t have a take on this—just flagging it as a crux.)
My guess is that if you hold capability fixed and make a marginal move in the direction of (better LM agents) + (smaller LMs) then you will make the world safer. It straightforwardly decreases the risk of deceptive alignment, makes oversight easier, and decreases the potential advantages of optimizing on outcomes.
Maybe rather than ‘different paths’ Paul just means that capabilities can come from more-powerful-LMs or more-sophisticated-agent-scaffolding. He says:
I buy something like this, at least. But (I weakly intuit) we’ll almost exclusively be relying on LM agents rather than mere next-token-predictors by default; there’s no need to boost LM agents. And even if that’s good, that doesn’t mean that marginal improvements in LM agents’ sophistication/complexity are safer than marginal improvements in underlying-LM-capability. (I don’t have a take on this—just flagging it as a crux.)
My guess is that if you hold capability fixed and make a marginal move in the direction of (better LM agents) + (smaller LMs) then you will make the world safer. It straightforwardly decreases the risk of deceptive alignment, makes oversight easier, and decreases the potential advantages of optimizing on outcomes.