However, in this post you suggest that ambitious vs narrow value learning is about the amount of feedback the algorithm requires.
That wasn’t exactly my point. My main point was that if we want an AI system that acts autonomously over a long period of time (think centuries), but it isn’t doing ambitious value learning (only narrow value learning), then we necessarily require a feedback mechanism that keeps the AI system “on track” (since my instrumental values will change over that period of time). Will add a summary sentence to the post.
I think it depends on the details of the implementation
Agreed, I was imagining the “default” implementation (eg. as in this paper).
For redundancy, if the narrow value learning system is trying to learn how much humans approve of various actions, we can tell the system that the negative score from our disapproval of tampering with the value learning system outweighs any positive score it could achieve through tampering.
Something along these lines seems promising, I hadn’t thought of this possibility before.
If the reward function weights rewards according to the certainty of the narrow value learning system that they are the correct reward, that creates incentives to keep the narrow value learning system operating, so the narrow value learning system can acquire greater certainty and provide a greater reward.
Yeah, uncertainty can definitely help get around this problem. (See also the next post, which should hopefully go up soon.)
Thanks for the reply! Looking forward to the next post!