When I try to understand the position you’re speaking from, I suppose you’re imagining a world where an agent’s true preferences are always and only represented by their current introspectively accessible probability+utility,[1] whereas I’m imagining a world where “value uncertainty” is really meaningful (there can be a difference between the probability+utility we can articulate and our true probability+utility).
If 50% rainbows and 50% puppies is indeed the best representation of our preferences, then I agree: maximize rainbows.
If 50% rainbows and 50% puppies is instead a representation of our credences about our unknown true values, my argument is as follows: the best thing for us would be to maximize our true values (whichever of the two this is). If we assume value learning works well, then Geometric UDT is a good approximation of that best option.
Sure, but if we put a third “if” on top (namely, “it’s a representation of our credences, but also both hypotheses are nosy neighbors that care about either world equally”), doesn’t that undo the second “if” and bring us back to the first?
When I try to understand the position you’re speaking from, I suppose you’re imagining a world where an agent’s true preferences are always and only represented by their current introspectively accessible probability+utility,[1] whereas I’m imagining a world where “value uncertainty” is really meaningful (there can be a difference between the probability+utility we can articulate and our true probability+utility).
If 50% rainbows and 50% puppies is indeed the best representation of our preferences, then I agree: maximize rainbows.
If 50% rainbows and 50% puppies is instead a representation of our credences about our unknown true values, my argument is as follows: the best thing for us would be to maximize our true values (whichever of the two this is). If we assume value learning works well, then Geometric UDT is a good approximation of that best option.
Here “introspectively accessible” really means: what we can understand well enough to directly build into a machine.
Sure, but if we put a third “if” on top (namely, “it’s a representation of our credences, but also both hypotheses are nosy neighbors that care about either world equally”), doesn’t that undo the second “if” and bring us back to the first?