Even if D is non-trivial, why is D trained to optimize the goodness of future decisions, rather than to optimize the quality of the current decision? Even if D e.g. needs to do a challenging search in order to pick a good action, I don’t see how this problem would arise.
Why would S, P, or I be interested in getting more resources? Presumably their goal is to solve the appropriate search or prediction problem. Plans that might modify one of them or acquire additional resources would then be submitted to V for evaluation.
I agree that if your AI system has two pieces A and B, such that you’ve trained (or hand-tuned) A to work well with B, and you later replace B with B’, you may get bad behavior. This is a general engineering problem with the non-robustness of machine learning systems. It’s a problem that already arises, I probably wouldn’t call it “wireheading” though.
I think there are real and important analogs of wireheading for many non-RL systems. I think it’s hard to talk about productively without being significantly much more precise though.
Even if D is non-trivial, why is D trained to optimize the goodness of future decisions, rather than to optimize the quality of the current decision? Even if D e.g. needs to do a challenging search in order to pick a good action, I don’t see how this problem would arise.
Why would S, P, or I be interested in getting more resources? Presumably their goal is to solve the appropriate search or prediction problem. Plans that might modify one of them or acquire additional resources would then be submitted to V for evaluation.
I agree that if your AI system has two pieces A and B, such that you’ve trained (or hand-tuned) A to work well with B, and you later replace B with B’, you may get bad behavior. This is a general engineering problem with the non-robustness of machine learning systems. It’s a problem that already arises, I probably wouldn’t call it “wireheading” though.
I think there are real and important analogs of wireheading for many non-RL systems. I think it’s hard to talk about productively without being significantly much more precise though.