You’ve mostly understood the problem-as-stated, and I like the way you’re thinking about it, but there’s some major loopholes in this approach.
First, I may value the happiness of agents who I cannot significantly impact via my actions—for instance, prisoners in North Korea.
Second, the actions we chose probably won’t provide enough data. Suppose there are n different people, and I could give any one of them $1. I value these possibilities differently (e.g. maybe because they have different wealth/cost of living to start with, or just because I like some of them better). If we knew how much I valued each action, then we’d know how much I valued each outcome. But in fact, if I chose person 3, then all we know is that I value person 3 having the dollar more than I value anyone else having it; that’s not enough information to back out how much I value each other person having the dollar. This sort of underdetermination will probably be the usual result, since the choice-of-action contains a lot less bits than a function mapping the whole action space to values.
Third, and arguably most important: “run the calculation for all desired moral agents” requires first identifying all the “desired moral agents”, which is itself an instance of the problem in the post. What the heck is a “moral agent”, and how does an AI know which ones are “desired”? These are latent variables in your world-model, and would need to be translated to something in the real world.
I was attempting to answer the first point, so let me rephrase:
Even though your ability to affect prisoners in North Korea is miniscule, we can still look at how much of it you’re doing. Are you spending any time seeking out ways you could be affecting them? Are you voting for and supporting and lobbying politicians who are more likely to use their greater power to affect the NK prisoner’s lives? Are you doing [unknown thing that the AI figures out would affect them]?
And, also, are you doing anything that is making their situation worse? Or any other of the multiple axis of being, since happiness isn’t everything, and even happiness isn’t a one-dimentional scale.
“Who counts as a moral agent? (And should they all have equal weights)” Is a question of philosophy, which I am not qualified to answer. But “who gets to decide the values to teach” it’s one meta-level up from the question of “how do we teach values”, so I take it as a given for the latter problem.
You’ve mostly understood the problem-as-stated, and I like the way you’re thinking about it, but there’s some major loopholes in this approach.
First, I may value the happiness of agents who I cannot significantly impact via my actions—for instance, prisoners in North Korea.
Second, the actions we chose probably won’t provide enough data. Suppose there are n different people, and I could give any one of them $1. I value these possibilities differently (e.g. maybe because they have different wealth/cost of living to start with, or just because I like some of them better). If we knew how much I valued each action, then we’d know how much I valued each outcome. But in fact, if I chose person 3, then all we know is that I value person 3 having the dollar more than I value anyone else having it; that’s not enough information to back out how much I value each other person having the dollar. This sort of underdetermination will probably be the usual result, since the choice-of-action contains a lot less bits than a function mapping the whole action space to values.
Third, and arguably most important: “run the calculation for all desired moral agents” requires first identifying all the “desired moral agents”, which is itself an instance of the problem in the post. What the heck is a “moral agent”, and how does an AI know which ones are “desired”? These are latent variables in your world-model, and would need to be translated to something in the real world.
I was attempting to answer the first point, so let me rephrase: Even though your ability to affect prisoners in North Korea is miniscule, we can still look at how much of it you’re doing. Are you spending any time seeking out ways you could be affecting them? Are you voting for and supporting and lobbying politicians who are more likely to use their greater power to affect the NK prisoner’s lives? Are you doing [unknown thing that the AI figures out would affect them]? And, also, are you doing anything that is making their situation worse? Or any other of the multiple axis of being, since happiness isn’t everything, and even happiness isn’t a one-dimentional scale.
“Who counts as a moral agent? (And should they all have equal weights)” Is a question of philosophy, which I am not qualified to answer. But “who gets to decide the values to teach” it’s one meta-level up from the question of “how do we teach values”, so I take it as a given for the latter problem.
This analysis falls apart when we take things to their logical extreme: I care about the happiness of human who are time-like separated from me.