When I try to understand the position you’re speaking from, I suppose you’re imagining a world where an agent’s true preferences are always and only represented by their current introspectively accessible probability+utility,[1] whereas I’m imagining a world where “value uncertainty” is really meaningful (there can be a difference between the probability+utility we can articulate and our true probability+utility).
If 50% rainbows and 50% puppies is indeed the best representation of our preferences, then I agree: maximize rainbows.
If 50% rainbows and 50% puppies is instead a representation of our credences about our unknown true values, my argument is as follows: the best thing for us would be to maximize our true values (whichever of the two this is). If we assume value learning works well, then Geometric UDT is a good approximation of that best option.
Sure, but if we put a third “if” on top (namely, “it’s a representation of our credences, but also both hypotheses are nosy neighbors that care about either world equally”), doesn’t that undo the second “if” and bring us back to the first?
Perhaps I’m still not understanding you, but here is my current interpretation of what you are saying:
The (expected utility) argument that it is valuable for us to get the ASI to entangle its values with ours relies on the assumption of non-nosy-ness.
That is: since we are uncertain which values are ours, but whichever thing we value, we’re just as happy to impose that thing on versions of ourselves which do not value that thing, we don’t see any increase in expected value from Geometric UDT.
I see this line of reasoning as insisting on taking max-expected-utility according to your explicit model of your values (including your value uncertainty), even when you have an option which you can prove is higher expected utility according to your true values (whatever they are).
My argument has a somewhat frequentist flavor: I’m postulating true values (similar to postulating a true population frequency), and then looking for guarantees with respect to them (somewhat similar to looking for an unbiased estimator). Perhaps that is why you’re finding it so counter-intuitive?
The crux of the issue seems to be whether we should always maximize our explicit estimate of expected utility, vs taking actions which we know are better with respect to our true values despite not knowing which values those are. One way to justify the latter would be via Knightian value uncertainty (ie infrabayesian value uncertainty), although that hasn’t been the argument I’ve been trying to make. I’m wondering if a more thoroughly geometric-rationality perspective would provide another sort of justification.
But the argument I’m trying to make here is closer to just: but you know Geometric UDT is better according to your true values, whatever they are!
== earlier draft reply for more context on my thinking ==
Perhaps I’m just not understanding your argument here, and you need to spell it out in more detail? My current interpretation is that you are interpreting “care about both worlds equally” as “care about rainbows and puppies equally” rather than “if I care about rainbows, then I equally want more rainbows in the (real) rainbow-world and the (counterfactual) puppy-world; if I care about puppies, then I equally want more puppies in the (real) puppy-world and the (counterfactual) rainbow-world.”
A value hypothesis is a nosy neighbor if[1] it wants the same things for you whether it is your true values or not. So what’s being asserted here (your “third if” as I’m understanding it) is that we are confident we’ve got that kind of relationship with ourselves—we don’t want “our values to be satisfied, whatever they are”—rather, whatever our values are, we want them to be satisfied across universes, even in counterfactual universes where we have different values.
Maximizing rainbows maximizes the expected value given our value uncertainty, but it is a catastrophe in the case that we are indeed puppy-loving. Moreover, it is an avoidable catastrophe; …
… and now I think I see your point?
The idea that it is valuable for us to get the ASI to entangle its values with ours relies on an assumption of non-nosyness.
There is a different way to justify this assumption,
When I try to understand the position you’re speaking from, I suppose you’re imagining a world where an agent’s true preferences are always and only represented by their current introspectively accessible probability+utility,[1] whereas I’m imagining a world where “value uncertainty” is really meaningful (there can be a difference between the probability+utility we can articulate and our true probability+utility).
If 50% rainbows and 50% puppies is indeed the best representation of our preferences, then I agree: maximize rainbows.
If 50% rainbows and 50% puppies is instead a representation of our credences about our unknown true values, my argument is as follows: the best thing for us would be to maximize our true values (whichever of the two this is). If we assume value learning works well, then Geometric UDT is a good approximation of that best option.
Here “introspectively accessible” really means: what we can understand well enough to directly build into a machine.
Sure, but if we put a third “if” on top (namely, “it’s a representation of our credences, but also both hypotheses are nosy neighbors that care about either world equally”), doesn’t that undo the second “if” and bring us back to the first?
Perhaps I’m still not understanding you, but here is my current interpretation of what you are saying:
The (expected utility) argument that it is valuable for us to get the ASI to entangle its values with ours relies on the assumption of non-nosy-ness.
That is: since we are uncertain which values are ours, but whichever thing we value, we’re just as happy to impose that thing on versions of ourselves which do not value that thing, we don’t see any increase in expected value from Geometric UDT.
I see this line of reasoning as insisting on taking max-expected-utility according to your explicit model of your values (including your value uncertainty), even when you have an option which you can prove is higher expected utility according to your true values (whatever they are).
My argument has a somewhat frequentist flavor: I’m postulating true values (similar to postulating a true population frequency), and then looking for guarantees with respect to them (somewhat similar to looking for an unbiased estimator). Perhaps that is why you’re finding it so counter-intuitive?
The crux of the issue seems to be whether we should always maximize our explicit estimate of expected utility, vs taking actions which we know are better with respect to our true values despite not knowing which values those are. One way to justify the latter would be via Knightian value uncertainty (ie infrabayesian value uncertainty), although that hasn’t been the argument I’ve been trying to make. I’m wondering if a more thoroughly geometric-rationality perspective would provide another sort of justification.
But the argument I’m trying to make here is closer to just: but you know Geometric UDT is better according to your true values, whatever they are!
== earlier draft reply for more context on my thinking ==
Perhaps I’m just not understanding your argument here, and you need to spell it out in more detail? My current interpretation is that you are interpreting “care about both worlds equally” as “care about rainbows and puppies equally” rather than “if I care about rainbows, then I equally want more rainbows in the (real) rainbow-world and the (counterfactual) puppy-world; if I care about puppies, then I equally want more puppies in the (real) puppy-world and the (counterfactual) rainbow-world.”
A value hypothesis is a nosy neighbor if[1] it wants the same things for you whether it is your true values or not. So what’s being asserted here (your “third if” as I’m understanding it) is that we are confident we’ve got that kind of relationship with ourselves—we don’t want “our values to be satisfied, whatever they are”—rather, whatever our values are, we want them to be satisfied across universes, even in counterfactual universes where we have different values.
Maximizing rainbows maximizes the expected value given our value uncertainty, but it is a catastrophe in the case that we are indeed puppy-loving. Moreover, it is an avoidable catastrophe; …
… and now I think I see your point?
The idea that it is valuable for us to get the ASI to entangle its values with ours relies on an assumption of non-nosyness.
There is a different way to justify this assumption,
(but not “only if”; there are other ways to be a nosy neighbor)