Ah I see, you’re using a collection of narrower systems and oversight to try and provide safety. Well there are some proposals like this which provide for an AGI that’s not agentic and may have better safety properties. Eric Drexler’s CAIS comes to mind.
But if a proposal is going to be implemented by a major AI lab then it needs to be competitive too. I don’t think it’s clear that systems like this are competitive with agentic systems. So in the kinds of advanced AI we are still likely to see implemented in the real world, instrumental convergence is still very much a concern.
Why isn’t it competitive? A is being trained the same way as an agentic system, so it will be competitive.
Adding B is a 2x runtime/training-cost overhead, so there is a “constant factor” cost; is that enough to say something is “not competitive”? In practice I’d expect you could strike a good safety/overhead balance for much less.
Hmm well if A is being trained the same way using deep learning toward being an agentic system, then it is subject to mesa-optimization and having goals, isn’t it? And being subject to mesa-optimization, do you have a way to address inner misalignment failures like deceptive alignment? Oversight alone can be thwarted by a deceptively-aligned mesa-optimizer.
You might possibly address this if you give the overseer good enough transparency tools. But such tools don’t exist yet.
Ah I see, you’re using a collection of narrower systems and oversight to try and provide safety. Well there are some proposals like this which provide for an AGI that’s not agentic and may have better safety properties. Eric Drexler’s CAIS comes to mind.
But if a proposal is going to be implemented by a major AI lab then it needs to be competitive too. I don’t think it’s clear that systems like this are competitive with agentic systems. So in the kinds of advanced AI we are still likely to see implemented in the real world, instrumental convergence is still very much a concern.
Why isn’t it competitive? A is being trained the same way as an agentic system, so it will be competitive.
Adding B is a 2x runtime/training-cost overhead, so there is a “constant factor” cost; is that enough to say something is “not competitive”? In practice I’d expect you could strike a good safety/overhead balance for much less.
Hmm well if A is being trained the same way using deep learning toward being an agentic system, then it is subject to mesa-optimization and having goals, isn’t it? And being subject to mesa-optimization, do you have a way to address inner misalignment failures like deceptive alignment? Oversight alone can be thwarted by a deceptively-aligned mesa-optimizer.
You might possibly address this if you give the overseer good enough transparency tools. But such tools don’t exist yet.