This sounds to me like you’re arguing against letting a different set of properties of the decision system dominate ‘far’ decisions as opposed to ‘near’ decisions. But some of the earliest operations we do are loading in the AI’s model into the human’s decision system, and it seems to me like a pretty good case for modeling this as the counterfactual ‘the human anticipates the same things happening that the AI anticipates, conditional on these actions or strategies’, not the counterfactual where you tell a Christian fundamentalist the verbal statement ‘God doesn’t exist’ and model them screaming at the AI where it’s wrong. In other words, the first answer that occurs to me is along the lines of, “The counterfactuals we are doing mostly eliminate far mode except insofar as it would actually apply to particular, concrete, lived-in scenarios.”
I think it’s an issue I hadn’t formulated explicitly at the time. I’m still unsure about where the balance between verbal decision and urges should lie.
I might take a different direction for the heroin addicts. I’d try and argue that their desire for heroin has some features that we can use to directly strike it off. Some relevant features could be whether past version of the person would want (or would have wanted—UDT?) the desire removed, whether a person with the desire removed (and very little else changed) would agree that in retrospect it was a positive development. More generally, heroin addiction seems a perfect example of a pathological desire: a sort of self-protecting desire that hacks the human mind to provide a desire out of proportion with the positive effects it generates.
I’m not saying the line is sharp between heroin and, say, sex, but it seems better to deal directly with the negative features of heroin than to go too meta and hope we get a system that does the division for us.
This sounds to me like you’re arguing against letting a different set of properties of the decision system dominate ‘far’ decisions as opposed to ‘near’ decisions. But some of the earliest operations we do are loading in the AI’s model into the human’s decision system, and it seems to me like a pretty good case for modeling this as the counterfactual ‘the human anticipates the same things happening that the AI anticipates, conditional on these actions or strategies’, not the counterfactual where you tell a Christian fundamentalist the verbal statement ‘God doesn’t exist’ and model them screaming at the AI where it’s wrong. In other words, the first answer that occurs to me is along the lines of, “The counterfactuals we are doing mostly eliminate far mode except insofar as it would actually apply to particular, concrete, lived-in scenarios.”
Out of interest, was this always the plan, or is this a new patch to CEV?
I think it’s an issue I hadn’t formulated explicitly at the time. I’m still unsure about where the balance between verbal decision and urges should lie.
See also http://yudkowsky.tumblr.com/post/96877436365/ (some NSFW because the debate originated on Tumblr, and, well...)
Interesting.
I might take a different direction for the heroin addicts. I’d try and argue that their desire for heroin has some features that we can use to directly strike it off. Some relevant features could be whether past version of the person would want (or would have wanted—UDT?) the desire removed, whether a person with the desire removed (and very little else changed) would agree that in retrospect it was a positive development. More generally, heroin addiction seems a perfect example of a pathological desire: a sort of self-protecting desire that hacks the human mind to provide a desire out of proportion with the positive effects it generates.
I’m not saying the line is sharp between heroin and, say, sex, but it seems better to deal directly with the negative features of heroin than to go too meta and hope we get a system that does the division for us.