I think this is the wrong framing, resting on unnecessary assumptions about how impact must be solved. You’re concerned that it’s hard to look at the world and infer what it means for people to be “minimally impacted”. I agree: that’s really hard. I get the concern. We shouldn’t do that, and that’s not what AUP aims to do.
When I say “we shouldn’t do that, and that’s not what AUP aims to do”, I really, really, really mean it. (IIRC) I’ve never proposed doing it that way (although to be fair, my initial AUP post didn’t repeat this point enough). If e.g. my presentation doesn’t communicate the better frame, wait for the “Reframing Impact” sequence!
Your presentation had an example with randomly selected utility functions in a block world, that resulted in the agent taking less-irreversible actions around a specific block.
If we have randomly selected utility functions in the bucket-and-pool world, this may include utilities that care about the salt content or the exact molecules, or not. Depending on whether or not we include these, we run the risk of preserving the bucket when we need not, or kicking it when we should preserve it. This is because the “worth” of the water being in the bucket varies depending on human preferences, not on anything intrinsic to the design of the bucket and the pool.
The agent would still walk around for reasonable farsightedness; kicking the bucket into the pool perturbs most AUs. There’s no real “risk” to not kicking the bucket.
AUP is only defined over states for MDPs because states are the observations. AUP in partially observable environments uses reward functions over sensory inputs. Again, I assert we don’t need to think about molecules or ontologies.
But, as covered in the discussion linked above, worrying about penalizing molecular shifts or not misses the point of impact: the agent doesn’t catastrophically reappropriate our power, so we can still get it to do what we want. (The thread is another place to recalibrate on what I’ve tried to communicate thus far.)
The truth is that, with a tiny set of exceptions, all our actions are irreversible, shutting down many possibilities for ever.
AUP doesn’t care much at all about literal reversibility.
I think the discussion of reversibility and molecules is a distraction from the core of Stuart’s objection. I think he is saying that a value-agnostic impact measure cannot distinguish between the cases where the water in the bucket is or isn’t valuable (e.g. whether it has sentimental value to someone).
If AUP is not value-agnostic, it is using human preference information to fill in the “what we want” part of your definition of impact, i.e. define the auxiliary utility functions. In this case I would expect you and Stuart to be in agreement.
If AUP is value-agnostic, it is not using human preference information. Then I don’t see how the state representation/ontology invariance property helps to distinguish between the two cases. As discussed in this comment, state representation invariance holds over all representations that are consistent with the true human reward function. Thus, you can distinguish the two cases as long as you are using one of these reward-consistent representations. However, since a value-agnostic impact measure does not have access to the true reward function, you cannot guarantee that the state representation you are using to compute AUP is in the reward-consistent set. Then, you could fail to distinguish between the two cases, giving the same penalty for kicking a more or less valuable bucket.
I agree that it’s not the core, and I think this is a very cogent summary. There’s a deeper disagreement about what we need done that I’ll lay out in detail in Reframing Impact.
I’ve added an edit to the post, to show the problem: sometimes, the robot can’t kick the bucket, sometimes it must. And only human preferences distinguish these two cases. So, without knowing these preferences, how can it decide?
kicking the bucket into the pool perturbs most AUs. There’s no real “risk” to not kicking the bucket.
In this specific setup, no. But sometimes kicking the bucket is fine; sometimes kicking the metaphorical equivalent of the bucket is necessary. If the AI is never willing to kick the bucket—ie never willing to take actions that might, for certain utility functions, cause huge and irreparable harm—then it’s not willing to take any action at all.
I think this is the wrong framing, resting on unnecessary assumptions about how impact must be solved. You’re concerned that it’s hard to look at the world and infer what it means for people to be “minimally impacted”. I agree: that’s really hard. I get the concern. We shouldn’t do that, and that’s not what AUP aims to do.
When I say “we shouldn’t do that, and that’s not what AUP aims to do”, I really, really, really mean it. (IIRC) I’ve never proposed doing it that way (although to be fair, my initial AUP post didn’t repeat this point enough). If e.g. my presentation doesn’t communicate the better frame, wait for the “Reframing Impact” sequence!
Your presentation had an example with randomly selected utility functions in a block world, that resulted in the agent taking less-irreversible actions around a specific block.
If we have randomly selected utility functions in the bucket-and-pool world, this may include utilities that care about the salt content or the exact molecules, or not. Depending on whether or not we include these, we run the risk of preserving the bucket when we need not, or kicking it when we should preserve it. This is because the “worth” of the water being in the bucket varies depending on human preferences, not on anything intrinsic to the design of the bucket and the pool.
The agent would still walk around for reasonable farsightedness; kicking the bucket into the pool perturbs most AUs. There’s no real “risk” to not kicking the bucket.
AUP is only defined over states for MDPs because states are the observations. AUP in partially observable environments uses reward functions over sensory inputs. Again, I assert we don’t need to think about molecules or ontologies.
But, as covered in the discussion linked above, worrying about penalizing molecular shifts or not misses the point of impact: the agent doesn’t catastrophically reappropriate our power, so we can still get it to do what we want. (The thread is another place to recalibrate on what I’ve tried to communicate thus far.)
AUP doesn’t care much at all about literal reversibility.
I think the discussion of reversibility and molecules is a distraction from the core of Stuart’s objection. I think he is saying that a value-agnostic impact measure cannot distinguish between the cases where the water in the bucket is or isn’t valuable (e.g. whether it has sentimental value to someone).
If AUP is not value-agnostic, it is using human preference information to fill in the “what we want” part of your definition of impact, i.e. define the auxiliary utility functions. In this case I would expect you and Stuart to be in agreement.
If AUP is value-agnostic, it is not using human preference information. Then I don’t see how the state representation/ontology invariance property helps to distinguish between the two cases. As discussed in this comment, state representation invariance holds over all representations that are consistent with the true human reward function. Thus, you can distinguish the two cases as long as you are using one of these reward-consistent representations. However, since a value-agnostic impact measure does not have access to the true reward function, you cannot guarantee that the state representation you are using to compute AUP is in the reward-consistent set. Then, you could fail to distinguish between the two cases, giving the same penalty for kicking a more or less valuable bucket.
That’s an excellent summary.
I agree that it’s not the core, and I think this is a very cogent summary. There’s a deeper disagreement about what we need done that I’ll lay out in detail in Reframing Impact.
I’ve added an edit to the post, to show the problem: sometimes, the robot can’t kick the bucket, sometimes it must. And only human preferences distinguish these two cases. So, without knowing these preferences, how can it decide?
In this specific setup, no. But sometimes kicking the bucket is fine; sometimes kicking the metaphorical equivalent of the bucket is necessary. If the AI is never willing to kick the bucket—ie never willing to take actions that might, for certain utility functions, cause huge and irreparable harm—then it’s not willing to take any action at all.