Yeah, we are magically instantly influencing an AGI which will thereafter be outside of our light cone. This is not a proposal, or something which I’m claiming is possible in our universe. Just take for granted that such a thing is possible in this contrived example environment.
My conception of utility is that it’s a synthetic calculation from observations about the state of the universe, not that it’s a thing on it’s own which can carry information.
Well, maybe here’s a better way of communicating what I’m after:
Suppose that you have beliefs about the initial state of the right (AGI) half, and you know how it’s going to evolve; this gives you a distribution over right-half universe histories—you have beliefs about the AGI’s initial state, and you can compute the consequences of those beliefs in terms of how the right half of the universe will end up.
In this way, you can take expected utility over the joint universe history, without being able to observe what’s actually happening on the AGI’s end. This is similar to how I prefer “start a universe which grows to be filled with human flourishing” over “start a universe which fills itself with suffering”, even though I may not observe the fruits of either decision.
Thanks, so in this thought experiment, you influence an AI such that you justify (to yourself) imagining a nicer future for unknowable parts of the universe?
I suspect for most of us, it’s cleaner to model it as utility of your perceived/expected state of the universe(s) than a joint utility over multiple universes, but I think I understand what you’re saying, at least.
I’m torn about whether this seems lower-stakes or not. There’s a whole lot in my light cone which I theoretically can, but actually don’t watch. My utility-from-imagination is the same for those areas as for the truly inaccessible ones. Thus, errors in specification or implementation of the AGI seem to have as big an impact in your imagined part of the universe as they would in your part of reality. It may be lower-stakes if your imagination is flawed and you never believe there was an error, but then it seems wireheading would be easier.
I’m torn about whether this seems lower-stakes or not.
I think it is lower-stakes in a fairly straightforward way: An unaligned AGI on the right side of the universe won’t be able to kill you and your civilization.
Furthermore, most misspecifications will just end up with a worthless right half of the universe—you’d have to be quite good at alignment in order to motivate the AGI to actually create and harm humans, as opposed to the AGI wireheading forever off of a sensory reward signal.
If “you and your civilization” has an extreme privilege in your utility function, that means the AI is safer when it’s partitioned away from you, but it also means it doesn’t generate much utility in the first place. that’s not changing the problem, it’s just setting the coefficient of u(right) very low.
I imagine you can reduce the stakes of any alignment problem by not caring about the outcome.
In theory, the main benefit, is that if it works, the same thing could be set lose in your half of the universe. (Which brings up a new kind of treacherous turn.)
that’s not changing the problem, it’s just setting the coefficient of u(right) very low.
Less than half the universe—just the moon, or Pluto or something (maybe an asteroid) arguably is a region which starts out with low achievable utility, so...if the AI might be really good at optimizing for your utility then, in expectation, it might be a reasonably large gain in utility.
Still not on board with the value of this. Why would you expect an AGI that does no harm (as far as you know) in an unpopulated and unobserved portion of the universe also does no harm on Earth (where you keep your stuff, and get the vast majority of your utility, but also with a radically different context—nearly unrelated terms in your utility function).
Sure, but the stakes are embedded in U. The whole post would be simpler to just say “an AGI that can’t impact my utility is low stakes”. The partitioning of the universe only “works” if you don’t attempt to claim that U(left) is significant. To the extent that you CARE about what happens in that part of the universe, your utility is impacted by the AGI’s alignment or lack thereof.
Yeah, we are magically instantly influencing an AGI which will thereafter be outside of our light cone. This is not a proposal, or something which I’m claiming is possible in our universe. Just take for granted that such a thing is possible in this contrived example environment.
Well, maybe here’s a better way of communicating what I’m after:
Suppose that you have beliefs about the initial state of the right (AGI) half, and you know how it’s going to evolve; this gives you a distribution over right-half universe histories—you have beliefs about the AGI’s initial state, and you can compute the consequences of those beliefs in terms of how the right half of the universe will end up.
In this way, you can take expected utility over the joint universe history, without being able to observe what’s actually happening on the AGI’s end. This is similar to how I prefer “start a universe which grows to be filled with human flourishing” over “start a universe which fills itself with suffering”, even though I may not observe the fruits of either decision.
Is this clearer?
Thanks, so in this thought experiment, you influence an AI such that you justify (to yourself) imagining a nicer future for unknowable parts of the universe?
I suspect for most of us, it’s cleaner to model it as utility of your perceived/expected state of the universe(s) than a joint utility over multiple universes, but I think I understand what you’re saying, at least.
I’m torn about whether this seems lower-stakes or not. There’s a whole lot in my light cone which I theoretically can, but actually don’t watch. My utility-from-imagination is the same for those areas as for the truly inaccessible ones. Thus, errors in specification or implementation of the AGI seem to have as big an impact in your imagined part of the universe as they would in your part of reality. It may be lower-stakes if your imagination is flawed and you never believe there was an error, but then it seems wireheading would be easier.
I think it is lower-stakes in a fairly straightforward way: An unaligned AGI on the right side of the universe won’t be able to kill you and your civilization.
Furthermore, most misspecifications will just end up with a worthless right half of the universe—you’d have to be quite good at alignment in order to motivate the AGI to actually create and harm humans, as opposed to the AGI wireheading forever off of a sensory reward signal.
If “you and your civilization” has an extreme privilege in your utility function, that means the AI is safer when it’s partitioned away from you, but it also means it doesn’t generate much utility in the first place. that’s not changing the problem, it’s just setting the coefficient of u(right) very low.
I imagine you can reduce the stakes of any alignment problem by not caring about the outcome.
In theory, the main benefit, is that if it works, the same thing could be set lose in your half of the universe. (Which brings up a new kind of treacherous turn.)
Less than half the universe—just the moon, or Pluto or something (maybe an asteroid) arguably is a region which starts out with low achievable utility, so...if the AI might be really good at optimizing for your utility then, in expectation, it might be a reasonably large gain in utility.
Still not on board with the value of this. Why would you expect an AGI that does no harm (as far as you know) in an unpopulated and unobserved portion of the universe also does no harm on Earth (where you keep your stuff, and get the vast majority of your utility, but also with a radically different context—nearly unrelated terms in your utility function).
The hypothetical included knowing. So, the hypothetical approach I proposed exploited this.
The main benefit is that the AI is not at risk of killing you. In the left half of the universe, it is at risk of killing you.
I don’t follow why you disagree. It’s higher-stakes to operate something which can easily kill me, than to operate something which can’t.
Sure, but the stakes are embedded in U. The whole post would be simpler to just say “an AGI that can’t impact my utility is low stakes”. The partitioning of the universe only “works” if you don’t attempt to claim that U(left) is significant. To the extent that you CARE about what happens in that part of the universe, your utility is impacted by the AGI’s alignment or lack thereof.