I can sketch a solution which I think would work: Instead of a simple agent which maximizes a utility function, you need a more complex agent which maximizes utility subject to a constraint. The constraint, in this case, is that the maximizer is not allowed to fiddle with the relation between ‘reality’ and the input stream. (The AI is forbidden to wear rose-colored glasses.) But the maximizer is permitted to fiddle with the relation between the output stream and ‘reality’. (The AI is encouraged to design and build waldos.)
So how do you get constraints into the general framework of optimization? Well, the theory of Lagrange multipliers is one well known technique. Another, which I think might work well, is to build the AI as containing multiple simple utility maximizing subagents, who have negotiated a Nash bargain that forces adherence to a fairness constraint.
Many details to be worked out, but I think that this is the right general approach. Of course we know that a multi-agent coalition can never be as perfectly ‘rational’ as a simple single agent with a unified utility function. But I don’t think that this lack of ‘rationality’ should frighten us. ‘Rationality’ as defined in economic decision theory is just a word attached to a set of axioms. If those axioms don’t win, pick some different axioms.
So how do you get constraints into the general framework of optimization? Well, the theory of Lagrange multipliers is one well known technique.
No adjustment to the theory is needed—you can just use a different utility function with U=0 if the constraints are violated.
Conventional wisdom, I believe, is that setting up a constraint that has the desired effect is really difficult. If you forbid the agent from putting on spectacles, it just makes another agent that puts them on for it. If spectacles are made painful, a screen is constructed with the desired high-utility display on it. Saying precisely what counts as “fiddling with the input stream” turns out to be a difficult problem.
Constraints just become barriers between the superintelligence and its goal, problems to be worked around—and often it can find a way.
you can just use a different utility function with U=0 if the constraints are violated.
I assume you meant “U = large negative number”.
Conventional wisdom, I believe, is that setting up a constraint that has the desired effect is really difficult.
My intuition is that it becomes less difficult if you assign the responsibility of maintaining the constraint to a different sub-agent than the one who is trying to maximize unconstrained U. And have those two sub-agents interact by bargaining to resolve their non-zero-sum game.
It is just an intuition. I’ll be happy to clarify it, but less happy if someone insists that I rigorously defend it.
I can sketch a solution which I think would work: Instead of a simple agent which maximizes a utility function, you need a more complex agent which maximizes utility subject to a constraint. The constraint, in this case, is that the maximizer is not allowed to fiddle with the relation between ‘reality’ and the input stream. (The AI is forbidden to wear rose-colored glasses.) But the maximizer is permitted to fiddle with the relation between the output stream and ‘reality’. (The AI is encouraged to design and build waldos.)
So how do you get constraints into the general framework of optimization? Well, the theory of Lagrange multipliers is one well known technique. Another, which I think might work well, is to build the AI as containing multiple simple utility maximizing subagents, who have negotiated a Nash bargain that forces adherence to a fairness constraint.
Many details to be worked out, but I think that this is the right general approach. Of course we know that a multi-agent coalition can never be as perfectly ‘rational’ as a simple single agent with a unified utility function. But I don’t think that this lack of ‘rationality’ should frighten us. ‘Rationality’ as defined in economic decision theory is just a word attached to a set of axioms. If those axioms don’t win, pick some different axioms.
No adjustment to the theory is needed—you can just use a different utility function with U=0 if the constraints are violated.
Conventional wisdom, I believe, is that setting up a constraint that has the desired effect is really difficult. If you forbid the agent from putting on spectacles, it just makes another agent that puts them on for it. If spectacles are made painful, a screen is constructed with the desired high-utility display on it. Saying precisely what counts as “fiddling with the input stream” turns out to be a difficult problem.
Constraints just become barriers between the superintelligence and its goal, problems to be worked around—and often it can find a way.
I assume you meant “U = large negative number”.
My intuition is that it becomes less difficult if you assign the responsibility of maintaining the constraint to a different sub-agent than the one who is trying to maximize unconstrained U. And have those two sub-agents interact by bargaining to resolve their non-zero-sum game.
It is just an intuition. I’ll be happy to clarify it, but less happy if someone insists that I rigorously defend it.
I was thinking about bounded utiliity—normalized on [0,1].