I didn’t downvote you, but here’s somethng I’d like improved: Offer a more concrete example. The phrase “Action_X” is too vague to illustrate your example, so it doesn’t help clarify anything for anyone.
I tried to avoid concrete examples deliberately because 1.) people have been criticized for mentioning AI when you could just use an abstract agent instead 2.) concrete examples would quickly approach something people don’t want to hear about (can’t say more).
My problem is that I don’t see why one would change their behavior based on the possibility that an alien (different values; you are just instrumental to it) will reward you in future for changing your behavior according to its hypothetical volition. Such a being would have absolutely no incentive to act as promised once you served its purpose. One might argue that being honest does corroborate the given incentive, but you have no way to tell because it is hypothetical and therefore honesty is no factor. Implementing your expected payoff would be a waste of resources for such an agent.
Think about a usual movie scence where villain threatens to kill your if you don’t do as he wants. If you do then he might kill you anyway. But if humans were rational agents they would care about resources that they can use to reach their terminal goals. Therefore if you do what the rational villain wanted you to do, e.g. take over the world, it wouldn’t kill you anyway because that would be a waste of resources (assume that it does not need to prove its honesty ever again because it now rules the world).
I didn’t downvote you, but here’s somethng I’d like improved: Offer a more concrete example. The phrase “Action_X” is too vague to illustrate your example, so it doesn’t help clarify anything for anyone.
I tried to avoid concrete examples deliberately because 1.) people have been criticized for mentioning AI when you could just use an abstract agent instead 2.) concrete examples would quickly approach something people don’t want to hear about (can’t say more).
My problem is that I don’t see why one would change their behavior based on the possibility that an alien (different values; you are just instrumental to it) will reward you in future for changing your behavior according to its hypothetical volition. Such a being would have absolutely no incentive to act as promised once you served its purpose. One might argue that being honest does corroborate the given incentive, but you have no way to tell because it is hypothetical and therefore honesty is no factor. Implementing your expected payoff would be a waste of resources for such an agent.
Think about a usual movie scence where villain threatens to kill your if you don’t do as he wants. If you do then he might kill you anyway. But if humans were rational agents they would care about resources that they can use to reach their terminal goals. Therefore if you do what the rational villain wanted you to do, e.g. take over the world, it wouldn’t kill you anyway because that would be a waste of resources (assume that it does not need to prove its honesty ever again because it now rules the world).