I think your anticipated counterarguments are handwavy.
To me, you’re making a similar mistake as Jurgen Schmidhuber did when he said that surprising compressibility was all you needed to optimize to get human values. You’re imagining that the AI will help us get social status because it’s empowering, much like Schmidhuber imagines the AI will create beautiful music because music is compressible. It’s true that music is compressible, but it’s not optimally compressible—it’s not what an AI would do if it was actually optimizing for surprising compressibility.
Social status, or helping us be smarter, or other nice stuff would indeed be empowering. But they’re not generated by thinking “what’s optimal for ensuring my control over my sensory stream?” They’re generated by taking the nice things that we want for other reasons and then noticing that they’re more empowering than their opposites.
The most serious counter-point you raise is the one about identity. Wouldn’t an AI maximizing my empowerment want to preserve what makes me “me?” This isn’t exactly the definition used in the gridworlds, which defines agents in terms of their input/output interfaces, but it’s a totally reasonable implementation.
The issue is that this identity requirement is treated as a constraint by an empowerment-macimizing search process. The empowerment maximizer is still trying to erase all practical differences between me and a paperclip maximizer, which is a goal I don’t like. But it doesn’t care that I don’t like it, it just wants the future object that has maximal control over its own sensory inputs to still register as “me” to whatever standard it uses.
The empowerment maximizer is still trying to erase all practical differences between me and a paperclip maximizer, which is a goal I don’t like.
This threw me off initially because of the use of ‘paperclip maximizer’ as a specific value system. But I do partially agree with the steelmanned version of this which is “erase all practical differences between you and the maximally longtermist version of you”.
Some component of our values/utility is short term non-empowerment hedonic which conflicts with long term optionality and an empowerment AI would only be aligned with the long term component; thus absent identity preservation mechanisms this AI would want us to constantly sacrifice for the long term.
But once again many things that appear hedonic—such as fun—are actually components of empowerment related intrinsic motivation, so if the empowerment AGI was going to change us (say after uploading), it would keep fun or give us some improved version of it.
But I actually already agreed with this earlier:
So in essence you are arguing that you may have discount rate high enough to cause significant conflict between long term and short term utility, and empowerment always favors long term. I largely agree with this, but we can combine long term empowerment with learned human values to cover any short term divergences.
Also it’s worth noting that everything here assumes superhuman AGI. When that is realized it changes everything in the sense that the better versions of ourselves—if we had far more knowledge, time to think, etc—probably would be much more long termist.
The empowerment maximizer is still trying to erase all practical differences between me and a paperclip maximizer, which is a goal I don’t like.
You keep asserting this obviously incorrect claim without justification. An AGI optimizing purely for your long term empowerment doesn’t care about your values—it has no incentive to change your long term utility function[1], even before any considerations of identity preservation which are also necessary for optimizing for your empowerment to be meaningful.
I do not believe you are clearly modelling what optimizing for your long term empowerment is like. It is near exactly equivalent to optimizing for your ability to achieve your true long term goals/values, whatever they are.
it has no incentive to change your long term utility function
By practical difference I meant that it wants to erase the impact of your goals on the universe. Whether it does that by changing your goals or not depends on implementation details.
Consider the perverse case of someone who wants to die—their utility function ranks futures of the universe lower if they’re in it, and higher if they’re not. You can’t maximize this person’s empowerment if they’re dead, so either you should convince them life is worth living, or you should just prevent them from affecting the universe.
By practical difference I meant that it wants to erase the impact of your goals on the universe.
Not it does not in general. The Franzmeyer et al prototype does not do that, and there are no reasons to suspect that becomes some universal problem as you scale these systems up.
Once again:
Optimizing for your long term empowerment is (for most agents) equivalent to optimizing for your ability to achieve your true long term goals/values, whatever they are.
An agent truly seeking your empowerment is seeking to give you power over itself as well, which precludes any effect of “erasing the impact of your goals”.
Consider the perverse case of someone who wants to die -
Sure and humans usually try to prevent humans from wanting to die.
Short comment on the last point—euthanasia is legal in several countries (thus wanting to die is not prevented, and even socially accepted) and in my opinion the moral choice of action in certain situations.
I think your anticipated counterarguments are handwavy.
To me, you’re making a similar mistake as Jurgen Schmidhuber did when he said that surprising compressibility was all you needed to optimize to get human values. You’re imagining that the AI will help us get social status because it’s empowering, much like Schmidhuber imagines the AI will create beautiful music because music is compressible. It’s true that music is compressible, but it’s not optimally compressible—it’s not what an AI would do if it was actually optimizing for surprising compressibility.
Social status, or helping us be smarter, or other nice stuff would indeed be empowering. But they’re not generated by thinking “what’s optimal for ensuring my control over my sensory stream?” They’re generated by taking the nice things that we want for other reasons and then noticing that they’re more empowering than their opposites.
The most serious counter-point you raise is the one about identity. Wouldn’t an AI maximizing my empowerment want to preserve what makes me “me?” This isn’t exactly the definition used in the gridworlds, which defines agents in terms of their input/output interfaces, but it’s a totally reasonable implementation.
The issue is that this identity requirement is treated as a constraint by an empowerment-macimizing search process. The empowerment maximizer is still trying to erase all practical differences between me and a paperclip maximizer, which is a goal I don’t like. But it doesn’t care that I don’t like it, it just wants the future object that has maximal control over its own sensory inputs to still register as “me” to whatever standard it uses.
This threw me off initially because of the use of ‘paperclip maximizer’ as a specific value system. But I do partially agree with the steelmanned version of this which is “erase all practical differences between you and the maximally longtermist version of you”.
Some component of our values/utility is short term non-empowerment hedonic which conflicts with long term optionality and an empowerment AI would only be aligned with the long term component; thus absent identity preservation mechanisms this AI would want us to constantly sacrifice for the long term.
But once again many things that appear hedonic—such as fun—are actually components of empowerment related intrinsic motivation, so if the empowerment AGI was going to change us (say after uploading), it would keep fun or give us some improved version of it.
But I actually already agreed with this earlier:
Also it’s worth noting that everything here assumes superhuman AGI. When that is realized it changes everything in the sense that the better versions of ourselves—if we had far more knowledge, time to think, etc—probably would be much more long termist.
You keep asserting this obviously incorrect claim without justification. An AGI optimizing purely for your long term empowerment doesn’t care about your values—it has no incentive to change your long term utility function[1], even before any considerations of identity preservation which are also necessary for optimizing for your empowerment to be meaningful.
I do not believe you are clearly modelling what optimizing for your long term empowerment is like. It is near exactly equivalent to optimizing for your ability to achieve your true long term goals/values, whatever they are.
It may have an incentive to change your discount rate to match its own, but that’s hardly the difference between you and a paperclip maximizer.
By practical difference I meant that it wants to erase the impact of your goals on the universe. Whether it does that by changing your goals or not depends on implementation details.
Consider the perverse case of someone who wants to die—their utility function ranks futures of the universe lower if they’re in it, and higher if they’re not. You can’t maximize this person’s empowerment if they’re dead, so either you should convince them life is worth living, or you should just prevent them from affecting the universe.
Not it does not in general. The Franzmeyer et al prototype does not do that, and there are no reasons to suspect that becomes some universal problem as you scale these systems up.
Once again:
Optimizing for your long term empowerment is (for most agents) equivalent to optimizing for your ability to achieve your true long term goals/values, whatever they are.
An agent truly seeking your empowerment is seeking to give you power over itself as well, which precludes any effect of “erasing the impact of your goals”.
Sure and humans usually try to prevent humans from wanting to die.
Short comment on the last point—euthanasia is legal in several countries (thus wanting to die is not prevented, and even socially accepted) and in my opinion the moral choice of action in certain situations.