Build the utility function such that excesses above the target level are penalized. If the agent is motivated to build 9 paperclips only and absolutely no more, then the idea of becoming a maximizer becomes distasteful.
This amuses me because I know actual human beings who behave as satisficers with extreme aversion to waste, far out of proportion to the objective costs of waste. For example: Friends who would buy a Toyota Corolla based on its excellent value-to-cost ratio, and who would not want a cheaper, less reliable car, but who would also turn down a much nicer car offered to them at a severe discount, on the grounds that the nicer car is “indulgent.”
If the agent is motivated to build 9 paperclips only and absolutely no more, then the idea of becoming a maximizer becomes distasteful.
That is already a maximiser—Its utility is maximised by building exactly 9 paperclips. It will take over universe to build more and more sophisticated ways of checking that there are exactly 9 paperclips, and more ways of preventing itself (however it defines itself) from inadvertently building more. In fact it may take over the universe first, put all the precautions in place, and build exactly 9 paperclips just before heat-death wipes out everything remaining.
So my friends are maximizers in the sense that they seek very specific targets in car-space, and the fact that those targets sit in the middle of a continuum of options is not relevant to the question at hand.
But you run into other problems then, like the certainty the OP touched on. Then the agent will spend significant resources ensuring that it has exactly 9 paperclips made, and wouldn’t accept a 90% probability of making 10 paperclips, because a 99.9999% probability of making 9 paperclips would yield more utility for it.
But the entire point of building FAI is to not require it to have resource usage limits, because it can’t help us if it’s limited. And such resource limits wouldn’t necessarily be useful for “testing” whether or not an AI was friendly, because if it weren’t, it would mimic the behaviour of a FAI so that it could get more resources.
But the entire point of building FAI is to not require it to have resource usage limits, because it can’t help us if it’s limited.
Machines can’t cause so much damage if they have resource-usage limits. This is a prudent safety precaution. It is not true that resource-limited machines can’t help us.
And such resource limits wouldn’t necessarily be useful for “testing” whether or not an AI was friendly, because if it weren’t, it would mimic the behaviour of a FAI so that it could get more resources.
So: the main idea is to attempt damage limitation. If the machine behaves itself, you can carry on with another session. If it does not, it is hopefully back to the drawing board, without too much damage done.
Build the utility function such that excesses above the target level are penalized. If the agent is motivated to build 9 paperclips only and absolutely no more, then the idea of becoming a maximizer becomes distasteful.
This amuses me because I know actual human beings who behave as satisficers with extreme aversion to waste, far out of proportion to the objective costs of waste. For example: Friends who would buy a Toyota Corolla based on its excellent value-to-cost ratio, and who would not want a cheaper, less reliable car, but who would also turn down a much nicer car offered to them at a severe discount, on the grounds that the nicer car is “indulgent.”
That is already a maximiser—Its utility is maximised by building exactly 9 paperclips. It will take over universe to build more and more sophisticated ways of checking that there are exactly 9 paperclips, and more ways of preventing itself (however it defines itself) from inadvertently building more. In fact it may take over the universe first, put all the precautions in place, and build exactly 9 paperclips just before heat-death wipes out everything remaining.
Ah, I see. Thanks for correcting me.
So my friends are maximizers in the sense that they seek very specific targets in car-space, and the fact that those targets sit in the middle of a continuum of options is not relevant to the question at hand.
But you run into other problems then, like the certainty the OP touched on. Then the agent will spend significant resources ensuring that it has exactly 9 paperclips made, and wouldn’t accept a 90% probability of making 10 paperclips, because a 99.9999% probability of making 9 paperclips would yield more utility for it.
Sooo—you would normally give such an agent time and resource-usage limits.
But the entire point of building FAI is to not require it to have resource usage limits, because it can’t help us if it’s limited. And such resource limits wouldn’t necessarily be useful for “testing” whether or not an AI was friendly, because if it weren’t, it would mimic the behaviour of a FAI so that it could get more resources.
Machines can’t cause so much damage if they have resource-usage limits. This is a prudent safety precaution. It is not true that resource-limited machines can’t help us.
So: the main idea is to attempt damage limitation. If the machine behaves itself, you can carry on with another session. If it does not, it is hopefully back to the drawing board, without too much damage done.