In my experience so far, we need to include our values, in part, to define “reasonable” utility functions.
It seems that an extremely broad set of input attainable functions suffice to capture the “reasonable“ functions with respect to which we want to be low impact. For example,
“remaining on”, “reward linear in how many blue pixels are observed each time step”, etc. All thanks to instrumental convergence and opportunity cost.
It seems that an extremely broad set of input attainable functions suffice to capture the “reasonable“ functions with respect to which we want to be low impact. For example, “remaining on”, “reward linear in how many blue pixels are observed each time step”, etc. All thanks to instrumental convergence and opportunity cost.