Seems less likely to break if we can get every step in the chain to actually care about the stuff that we care about so that it’s really trying to help us think of good constraints for the next step up as much as possible, rather than just staying within its constraints and not lying to us.
Yeah, probably, but what do you mean by that? Do you mean ‘the maxima of its utility function / longterm goals is the same as what would have happened by default if it didn’t accumulate power over us / didn’t take over / etc.?’ Or do you mean ‘like us, it wants people to not die, be happy, be rich, …?’ If it’s just a list of things like that, but the list is not exactly the same as our list, well, it can get more of what it wants by breaking the chain.
Seems less likely to break if we can get every step in the chain to actually care about the stuff that we care about so that it’s really trying to help us think of good constraints for the next step up as much as possible, rather than just staying within its constraints and not lying to us.
Yeah, probably, but what do you mean by that? Do you mean ‘the maxima of its utility function / longterm goals is the same as what would have happened by default if it didn’t accumulate power over us / didn’t take over / etc.?’ Or do you mean ‘like us, it wants people to not die, be happy, be rich, …?’ If it’s just a list of things like that, but the list is not exactly the same as our list, well, it can get more of what it wants by breaking the chain.