Now I hopefully did read your comment adequately. It presents an interesting idea, one that I don’t recall hearing before. It even seems like a good safety measure, with a tiny chance of making things better.
But beware of magical symbols: when you write x_n, what does it mean, exactly? AI’s utility function is necessarily about the whole world, or its interpretation as the whole history of the world. Expected utility that comes into action in AI’s decision-making is about all the possibilities for the history of the world (since that’s what is in general determined by AI’s decisions). When you say “x_n” in AI’s utility function, it means some condition on that, and this condition is no simpler than defining what the AI’s box is. By x_n you have to name “only this input device, and nothing else”. And by x_n=0 you also have to refer some exact condition on the state of the world, one that it won’t necessarily be possible to meet precisely. So the AI may just go on developing infrastructure for better understanding of the ultimate meaning of its values and finer and finer implementation of them. It has no motive to actually stop.
Even when AI’s utility function happens to be exactly maxed out, the AI is still there: what does implementation of an arbitrary plan look like, I wonder? Maybe just like the work of an AI arbitrarily pulled from mind design space, a paperclip maximizer of sorts. Utility is for selecting plans, and since all plans are equally preferable, an arbitrary plan gets selected, but this plan may involve a lot of heavy-duty creative restructuring of the world. Think of utility as a constructor for AI’s algorithm: there will still be some algorithm even if you produce it from “trivial” input.
And finally, you assume AI’s decision theory to be causal. Even after actually maxing out its utility, it may spend long nights contemplating various counterfactual opportunities it still has at increasing its expected utility using possibilities that weren’t realized in reality… (See on the wiki: counterfactual mugging, Newcomb’s problem, TDT, UDT; I also recommend Drescher’s talk on SS09).
By x_n you have to name “only this input device, and nothing else”.
This is what I sought to avoid by making the utility function depend only on a numerical value. The utility does not care which input device is feeding it information. You can assume that there is an internal variable x, inside the AI software, which is the input to the utility function. We, from the outside, are simply modifying the internal state of the AI at each moment in time. The nature of our actions, or of the the input device, are intentionally unaccounted for in the utility function.
This is, I feel, as far from a magical symbol as possible. The AI has a purely mathematical, internally defined utility function, with no implicit reference to external reality or any fuzzy concepts. There are no magical labels such as ‘box’, ‘signal’, ‘device’ that the utility function must reference to evaluate properly.
Even when AI’s utility function happens to be exactly maxed out, the AI is still there: what does implementation of an arbitrary plan look like, I wonder?
I wonder too. This is, in my opinion, the crux of the issue at hand. I believe it is inherently an implementation issue (a boundary case), rather than a property inherent to all utility maximizers. The best case scenario is that the AI defaults to no action (now this is a magical phrase, I agree). If, however, the AI simply picks a random plan, as you suggest, what is to prevent it from picking an alternative random plan in the next moment of time? We could even encourage this in the implementation: design the AI to randomly select, at each moment in time, a plan from all plans with maximum expected utility. The resulting AI, upon attaining its maximum utility, would turn into a random number generator: dangerous, perhaps, but not on the same order as an unfriendly superintelligence.
Now I hopefully did read your comment adequately. It presents an interesting idea, one that I don’t recall hearing before. It even seems like a good safety measure, with a tiny chance of making things better.
But beware of magical symbols: when you write x_n, what does it mean, exactly? AI’s utility function is necessarily about the whole world, or its interpretation as the whole history of the world. Expected utility that comes into action in AI’s decision-making is about all the possibilities for the history of the world (since that’s what is in general determined by AI’s decisions). When you say “x_n” in AI’s utility function, it means some condition on that, and this condition is no simpler than defining what the AI’s box is. By x_n you have to name “only this input device, and nothing else”. And by x_n=0 you also have to refer some exact condition on the state of the world, one that it won’t necessarily be possible to meet precisely. So the AI may just go on developing infrastructure for better understanding of the ultimate meaning of its values and finer and finer implementation of them. It has no motive to actually stop.
Even when AI’s utility function happens to be exactly maxed out, the AI is still there: what does implementation of an arbitrary plan look like, I wonder? Maybe just like the work of an AI arbitrarily pulled from mind design space, a paperclip maximizer of sorts. Utility is for selecting plans, and since all plans are equally preferable, an arbitrary plan gets selected, but this plan may involve a lot of heavy-duty creative restructuring of the world. Think of utility as a constructor for AI’s algorithm: there will still be some algorithm even if you produce it from “trivial” input.
And finally, you assume AI’s decision theory to be causal. Even after actually maxing out its utility, it may spend long nights contemplating various counterfactual opportunities it still has at increasing its expected utility using possibilities that weren’t realized in reality… (See on the wiki: counterfactual mugging, Newcomb’s problem, TDT, UDT; I also recommend Drescher’s talk on SS09).
This is what I sought to avoid by making the utility function depend only on a numerical value. The utility does not care which input device is feeding it information. You can assume that there is an internal variable x, inside the AI software, which is the input to the utility function. We, from the outside, are simply modifying the internal state of the AI at each moment in time. The nature of our actions, or of the the input device, are intentionally unaccounted for in the utility function.
This is, I feel, as far from a magical symbol as possible. The AI has a purely mathematical, internally defined utility function, with no implicit reference to external reality or any fuzzy concepts. There are no magical labels such as ‘box’, ‘signal’, ‘device’ that the utility function must reference to evaluate properly.
I wonder too. This is, in my opinion, the crux of the issue at hand. I believe it is inherently an implementation issue (a boundary case), rather than a property inherent to all utility maximizers. The best case scenario is that the AI defaults to no action (now this is a magical phrase, I agree). If, however, the AI simply picks a random plan, as you suggest, what is to prevent it from picking an alternative random plan in the next moment of time? We could even encourage this in the implementation: design the AI to randomly select, at each moment in time, a plan from all plans with maximum expected utility. The resulting AI, upon attaining its maximum utility, would turn into a random number generator: dangerous, perhaps, but not on the same order as an unfriendly superintelligence.