Those approaches fail the “subagent problem”. As in, the AI can pass it by creating a subagent to solve the problem for it, without the subagent having those restrictions.
I’m assuming the AI exists in a contained box. We can accurately measure the time it is on and/or resources used within the box. So it can’t create any subagents that also don’t use up it’s resources and count towards the penalty.
If the AI can escape from the box, we’ve already failed. There is little point in trying to control what it can do with it’s output channel.
Those approaches fail the “subagent problem”. As in, the AI can pass it by creating a subagent to solve the problem for it, without the subagent having those restrictions.
I’m assuming the AI exists in a contained box. We can accurately measure the time it is on and/or resources used within the box. So it can’t create any subagents that also don’t use up it’s resources and count towards the penalty.
If the AI can escape from the box, we’ve already failed. There is little point in trying to control what it can do with it’s output channel.
Reduced impact can control an AI that has the ability to get out of its box. That’s what I like about it.