Stuart_Armstrong comments on Reversible changes: consider a bucket of water

Stuart_Armstrong 29 Aug 2019 2:29 UTC
LW: 4 AF: 2
0
AF

Relative value of the bucket contents compared to the goal is represented by the weight on the impact penalty relative to the reward.

Yep, I agree :-)

I generally think that impact measures don’t have to be value-agnostic, as long as they require less input about human preferences than the general value learning problem.

Then we are in full agreement :-) I argue that low impact, corrigibility, and similar approaches, require some but not all of human preferences. “some” because of arguments like this one; “not all” because humans with very different values can agree on what constitutes low impact, so only part of their values are needed.