This is a really neat idea. It limits the optimizing power of the AI to no more than a very lucky and intelligent human. But that might be enough to bootstrap to more useful work. Ask the human to solve some hard technical problem. Or even “come up with a better approach to FAI”.
This seems like a really elaborate way of limiting the AIs optimization power. Why not just limit it’s computational resources or level of self improvement? Surely there must be ways to physically restrict an AI’s intelligence below super-intelligent ability. I know it’s a tricky thing to get right, but so are these ideas. Has this approach been considered at all?
I don’t think all the whole brain emulation stuff is necessary. Just ask it to produce outputs that mimic humans. We have plenty human writing to train it on. It’s goal is then to maximize the probability that it it’s output came from a human vs an AI, conditioned on it having solved the problem. I think that is about equivalent to your idea.
Why not just limit it’s computational resources or level of self improvement?
Because it’s very hard to define “level of self improvement”, and it’s not clear how to relate “limited computational resources” with “limited abilities”.
Then perhaps we should research ways to measure and restrict intelligence/optimization power.
Just off the top of my head, one way would be to add another term to it’s utility function. Representing the amount of computing power used (or time). It would then have an incentive to use as little computing power as possible to meet it’s goal.
An example, you ask the AI to solve a problem for you. The utility function is maximizing the probability that it’s answer will be accepted by you as a solution. But after the probability goes above 90%, the utility stops, and a penalty is added for using more computing power.
So the AI tries to solve the problem, but uses the minimal amount of optimization necessary, and doesn’t over optimize.
Those approaches fail the “subagent problem”. As in, the AI can pass it by creating a subagent to solve the problem for it, without the subagent having those restrictions.
I’m assuming the AI exists in a contained box. We can accurately measure the time it is on and/or resources used within the box. So it can’t create any subagents that also don’t use up it’s resources and count towards the penalty.
If the AI can escape from the box, we’ve already failed. There is little point in trying to control what it can do with it’s output channel.
This is a really neat idea. It limits the optimizing power of the AI to no more than a very lucky and intelligent human. But that might be enough to bootstrap to more useful work. Ask the human to solve some hard technical problem. Or even “come up with a better approach to FAI”.
This seems like a really elaborate way of limiting the AIs optimization power. Why not just limit it’s computational resources or level of self improvement? Surely there must be ways to physically restrict an AI’s intelligence below super-intelligent ability. I know it’s a tricky thing to get right, but so are these ideas. Has this approach been considered at all?
I don’t think all the whole brain emulation stuff is necessary. Just ask it to produce outputs that mimic humans. We have plenty human writing to train it on. It’s goal is then to maximize the probability that it it’s output came from a human vs an AI, conditioned on it having solved the problem. I think that is about equivalent to your idea.
Because it’s very hard to define “level of self improvement”, and it’s not clear how to relate “limited computational resources” with “limited abilities”.
Then perhaps we should research ways to measure and restrict intelligence/optimization power.
Just off the top of my head, one way would be to add another term to it’s utility function. Representing the amount of computing power used (or time). It would then have an incentive to use as little computing power as possible to meet it’s goal.
An example, you ask the AI to solve a problem for you. The utility function is maximizing the probability that it’s answer will be accepted by you as a solution. But after the probability goes above 90%, the utility stops, and a penalty is added for using more computing power.
So the AI tries to solve the problem, but uses the minimal amount of optimization necessary, and doesn’t over optimize.
Those approaches fail the “subagent problem”. As in, the AI can pass it by creating a subagent to solve the problem for it, without the subagent having those restrictions.
I’m assuming the AI exists in a contained box. We can accurately measure the time it is on and/or resources used within the box. So it can’t create any subagents that also don’t use up it’s resources and count towards the penalty.
If the AI can escape from the box, we’ve already failed. There is little point in trying to control what it can do with it’s output channel.
Reduced impact can control an AI that has the ability to get out of its box. That’s what I like about it.