Think for instance of Omohundro’s classic AI drives. Would they arise the same way if the AGI 1) Had a self-model 2)Had the same layered different lenght structure of self that we do?
You may be arguing that vit would not have a self-like structure at all. Which is indeed possible. It may be that its utility function is sufficiently different from ours that that kind of problem doesn’t arise at all. The one thing that is worrisome is that if you ask that function to consult people’s function, to extract some form of extrapolation, then you don’t have the problem in the meta level, but still have it on the level of what the AGI thinks people are (say, because it scrutinized short-terms)
I’m also fine with this not applying for different reasons, in which case take the text to be only about humans, and ignore the last paragraph.
The one thing that is worrisome is that if you ask that function to consult people’s function, to extract some form of extrapolation, then you don’t have the problem in the meta level, but still have it on the level of what the AGI thinks people are (say, because it scrutinized short-terms)
That’s a good point. One could imagine a method of getting utility functions from human values that, maybe due to improper specification, returned some parts from short-term desires and some other parts from long-term desires, maybe even inconsistently. Though that still wouldn’t result in the AI acting like a human—it would do weirder things.
Think for instance of Omohundro’s classic AI drives. Would they arise the same way if the AGI 1) Had a self-model 2)Had the same layered different lenght structure of self that we do?
You may be arguing that vit would not have a self-like structure at all. Which is indeed possible. It may be that its utility function is sufficiently different from ours that that kind of problem doesn’t arise at all. The one thing that is worrisome is that if you ask that function to consult people’s function, to extract some form of extrapolation, then you don’t have the problem in the meta level, but still have it on the level of what the AGI thinks people are (say, because it scrutinized short-terms)
I’m also fine with this not applying for different reasons, in which case take the text to be only about humans, and ignore the last paragraph.
That’s a good point. One could imagine a method of getting utility functions from human values that, maybe due to improper specification, returned some parts from short-term desires and some other parts from long-term desires, maybe even inconsistently. Though that still wouldn’t result in the AI acting like a human—it would do weirder things.