So when a superintelligence arises that, despite being Friendly and having the correct goals, does the AGI equivalent of scrolling 9gag, eating Pringles and drinking booze all day long, tell the programmers that the concept of Self, Personal Identity, Agent, or Me-ness was not sufficiently well described, and vit cares too much for vits short-term selves
This is improper anthropomorphising of something we could presumably write down the math for.
Think for instance of Omohundro’s classic AI drives. Would they arise the same way if the AGI 1) Had a self-model 2)Had the same layered different lenght structure of self that we do?
You may be arguing that vit would not have a self-like structure at all. Which is indeed possible. It may be that its utility function is sufficiently different from ours that that kind of problem doesn’t arise at all. The one thing that is worrisome is that if you ask that function to consult people’s function, to extract some form of extrapolation, then you don’t have the problem in the meta level, but still have it on the level of what the AGI thinks people are (say, because it scrutinized short-terms)
I’m also fine with this not applying for different reasons, in which case take the text to be only about humans, and ignore the last paragraph.
The one thing that is worrisome is that if you ask that function to consult people’s function, to extract some form of extrapolation, then you don’t have the problem in the meta level, but still have it on the level of what the AGI thinks people are (say, because it scrutinized short-terms)
That’s a good point. One could imagine a method of getting utility functions from human values that, maybe due to improper specification, returned some parts from short-term desires and some other parts from long-term desires, maybe even inconsistently. Though that still wouldn’t result in the AI acting like a human—it would do weirder things.
This is improper anthropomorphising of something we could presumably write down the math for.
Think for instance of Omohundro’s classic AI drives. Would they arise the same way if the AGI 1) Had a self-model 2)Had the same layered different lenght structure of self that we do?
You may be arguing that vit would not have a self-like structure at all. Which is indeed possible. It may be that its utility function is sufficiently different from ours that that kind of problem doesn’t arise at all. The one thing that is worrisome is that if you ask that function to consult people’s function, to extract some form of extrapolation, then you don’t have the problem in the meta level, but still have it on the level of what the AGI thinks people are (say, because it scrutinized short-terms)
I’m also fine with this not applying for different reasons, in which case take the text to be only about humans, and ignore the last paragraph.
That’s a good point. One could imagine a method of getting utility functions from human values that, maybe due to improper specification, returned some parts from short-term desires and some other parts from long-term desires, maybe even inconsistently. Though that still wouldn’t result in the AI acting like a human—it would do weirder things.