If “you and your civilization” has an extreme privilege in your utility function, that means the AI is safer when it’s partitioned away from you, but it also means it doesn’t generate much utility in the first place. that’s not changing the problem, it’s just setting the coefficient of u(right) very low.
I imagine you can reduce the stakes of any alignment problem by not caring about the outcome.
In theory, the main benefit, is that if it works, the same thing could be set lose in your half of the universe. (Which brings up a new kind of treacherous turn.)
that’s not changing the problem, it’s just setting the coefficient of u(right) very low.
Less than half the universe—just the moon, or Pluto or something (maybe an asteroid) arguably is a region which starts out with low achievable utility, so...if the AI might be really good at optimizing for your utility then, in expectation, it might be a reasonably large gain in utility.
Still not on board with the value of this. Why would you expect an AGI that does no harm (as far as you know) in an unpopulated and unobserved portion of the universe also does no harm on Earth (where you keep your stuff, and get the vast majority of your utility, but also with a radically different context—nearly unrelated terms in your utility function).
Sure, but the stakes are embedded in U. The whole post would be simpler to just say “an AGI that can’t impact my utility is low stakes”. The partitioning of the universe only “works” if you don’t attempt to claim that U(left) is significant. To the extent that you CARE about what happens in that part of the universe, your utility is impacted by the AGI’s alignment or lack thereof.
If “you and your civilization” has an extreme privilege in your utility function, that means the AI is safer when it’s partitioned away from you, but it also means it doesn’t generate much utility in the first place. that’s not changing the problem, it’s just setting the coefficient of u(right) very low.
I imagine you can reduce the stakes of any alignment problem by not caring about the outcome.
In theory, the main benefit, is that if it works, the same thing could be set lose in your half of the universe. (Which brings up a new kind of treacherous turn.)
Less than half the universe—just the moon, or Pluto or something (maybe an asteroid) arguably is a region which starts out with low achievable utility, so...if the AI might be really good at optimizing for your utility then, in expectation, it might be a reasonably large gain in utility.
Still not on board with the value of this. Why would you expect an AGI that does no harm (as far as you know) in an unpopulated and unobserved portion of the universe also does no harm on Earth (where you keep your stuff, and get the vast majority of your utility, but also with a radically different context—nearly unrelated terms in your utility function).
The hypothetical included knowing. So, the hypothetical approach I proposed exploited this.
The main benefit is that the AI is not at risk of killing you. In the left half of the universe, it is at risk of killing you.
I don’t follow why you disagree. It’s higher-stakes to operate something which can easily kill me, than to operate something which can’t.
Sure, but the stakes are embedded in U. The whole post would be simpler to just say “an AGI that can’t impact my utility is low stakes”. The partitioning of the universe only “works” if you don’t attempt to claim that U(left) is significant. To the extent that you CARE about what happens in that part of the universe, your utility is impacted by the AGI’s alignment or lack thereof.