Do you think that it is possible to build an AI that does the moral thing even without being directly contingent on human preferences? Conditional on its possibility, do you think we should attempt to create such an AI?
I share your trepidation about humans and their values, but I see that as implying that we have to be meta enough such that even if humans are wrong, our AI will still do what is right. It seems to me that this is still a real possibility. For an example of an FAI architecture that is more in this direction, check out CFAI.
Do you think that it is possible to build an AI that does the moral thing even without being directly contingent on human preferences?
No. I believe that it is practically impossible to systematically and consistently assign utility to world states. I believe that utility can not even be grounded and therefore defined. I don’t think that there exists anything like “human preferences” and therefore human utility functions, apart from purely theoretical highly complex and therefore computationally intractable approximations. I don’t think that there is anything like a “self” that can be used to define what constitutes a human being, not practically anyway. I don’t believe that it is practically possible to decide what is morally right and wrong in the long term, not even for a superintelligence.
I believe that stable goals are impossible and that any attempt at extrapolating the volition of people will alter it.
Besides I believe that we won’t be able to figure out any of the following in time:
The nature of consciousness and its moral significance.
The relation and moral significance of suffering/pain/fun/happiness.
I further believe that the following problems are impossible to solve, respectively constitute a reductio ad absurdum of certain ideas:
I believe that it is practically impossible to systematically and consistently assign utility to world states. I believe that utility can not even be grounded and therefore defined. I don’t think that there exists anything like “human preferences” and therefore human utility functions, apart from purely theoretical highly complex and therefore computationally intractable approximations. I don’t think that there is anything like a “self” that can be used to define what constitutes a human being, not practically anyway. I don’t believe that it is practically possible to decide what is morally right and wrong in the long term, not even for a superintelligence.
Strange stuff.
Surely “right” and “wrong” make the most sense in the context of a specified moral system.
If you are using those terms outside such a context, it usually implies some kind of moral realism—in which case, one wonders what sort of moral realism you have in mind.
Do you think that it is possible to build an AI that does the moral thing even without being directly contingent on human preferences? Conditional on its possibility, do you think we should attempt to create such an AI?
I share your trepidation about humans and their values, but I see that as implying that we have to be meta enough such that even if humans are wrong, our AI will still do what is right. It seems to me that this is still a real possibility. For an example of an FAI architecture that is more in this direction, check out CFAI.
No. I believe that it is practically impossible to systematically and consistently assign utility to world states. I believe that utility can not even be grounded and therefore defined. I don’t think that there exists anything like “human preferences” and therefore human utility functions, apart from purely theoretical highly complex and therefore computationally intractable approximations. I don’t think that there is anything like a “self” that can be used to define what constitutes a human being, not practically anyway. I don’t believe that it is practically possible to decide what is morally right and wrong in the long term, not even for a superintelligence.
I believe that stable goals are impossible and that any attempt at extrapolating the volition of people will alter it.
Besides I believe that we won’t be able to figure out any of the following in time:
The nature of consciousness and its moral significance.
The relation and moral significance of suffering/pain/fun/happiness.
I further believe that the following problems are impossible to solve, respectively constitute a reductio ad absurdum of certain ideas:
Utility monsters
Pascal’s Mugging
The Lifespan Dilemma
Strange stuff.
Surely “right” and “wrong” make the most sense in the context of a specified moral system.
If you are using those terms outside such a context, it usually implies some kind of moral realism—in which case, one wonders what sort of moral realism you have in mind.