So it would be sloppy to code what you want an AI to do the way you code propositions/beliefs. That is, you don’t want to fit the bulk of the goal architecture inside it’s belief networks. Nor certainly, should you expect the AI to learn moral truths by looking at the world. Once you tell it to care about what people want, then it can look at people to find that out- but it can’t learn to care about what people want just by observing the world. Those kind of moral facts don’t exist. So certainly knowing things about meta-ethics will help create an FAI.
But that’s an argument for smart people to spend time thinking about meta-ethics. It’s not an argument for a descriptive program that finds folk-metaethics to form the the goal architecture of an AI. For one thing, most humans seem to have really confused meta-ethical beliefs.
On reflection, I think by ‘metaethical thought’ Mitchell probably meant the normative theory that describes human ethics. I don’t think there is one of those either, but it’s not obviously wrong and certainly makes more sense.
I meant: innate cognitive architecture which plays a role in metaethical thought.
You might be familiar with the idea that, according to CEV, you figure out the full complexity of human value using neuroscience (rather than relying on people’s opinions about what they value), and then you “extrapolate” or “renormalize” that using “reflective decision theory” (which does not yet exist). The idea here is that the method of extrapolation should also be extracted from the details of human cognitive architecture, rather than just figured out through intuition or pure reason.
Suppose we have a person—or an intelligent agent—with a particular “value system” or “private decision theory”. Note that we are talking about its actual decision theory, as embodied in its causal structure and decision-making dispositions, and not just its introspective opinions about how it decides. Given this actual value system, RDT is supposed to tell us what would happen to that value system if it were changed according to its own implicit ideals. All I’m saying is that there’s a meta-ethical relativism for RDT, for large classes of decision architecture. Different theories about how to normatively self-modify a decision architecture ought to be possible, and the selection of which RDT is used should also be derived from the agent’s own cognitive architecture.
Of course you can go meta again and say, maybe the RDT extraction procedure can also take different forms—etc. It’s one of the tasks of the FAI/CEV/RDT research program to figure out when and how the ethical metalevels stop.
So it would be sloppy to code what you want an AI to do the way you code propositions/beliefs. That is, you don’t want to fit the bulk of the goal architecture inside it’s belief networks. Nor certainly, should you expect the AI to learn moral truths by looking at the world. Once you tell it to care about what people want, then it can look at people to find that out- but it can’t learn to care about what people want just by observing the world. Those kind of moral facts don’t exist. So certainly knowing things about meta-ethics will help create an FAI.
But that’s an argument for smart people to spend time thinking about meta-ethics. It’s not an argument for a descriptive program that finds folk-metaethics to form the the goal architecture of an AI. For one thing, most humans seem to have really confused meta-ethical beliefs.
On reflection, I think by ‘metaethical thought’ Mitchell probably meant the normative theory that describes human ethics. I don’t think there is one of those either, but it’s not obviously wrong and certainly makes more sense.
I meant: innate cognitive architecture which plays a role in metaethical thought.
You might be familiar with the idea that, according to CEV, you figure out the full complexity of human value using neuroscience (rather than relying on people’s opinions about what they value), and then you “extrapolate” or “renormalize” that using “reflective decision theory” (which does not yet exist). The idea here is that the method of extrapolation should also be extracted from the details of human cognitive architecture, rather than just figured out through intuition or pure reason.
Suppose we have a person—or an intelligent agent—with a particular “value system” or “private decision theory”. Note that we are talking about its actual decision theory, as embodied in its causal structure and decision-making dispositions, and not just its introspective opinions about how it decides. Given this actual value system, RDT is supposed to tell us what would happen to that value system if it were changed according to its own implicit ideals. All I’m saying is that there’s a meta-ethical relativism for RDT, for large classes of decision architecture. Different theories about how to normatively self-modify a decision architecture ought to be possible, and the selection of which RDT is used should also be derived from the agent’s own cognitive architecture.
Of course you can go meta again and say, maybe the RDT extraction procedure can also take different forms—etc. It’s one of the tasks of the FAI/CEV/RDT research program to figure out when and how the ethical metalevels stop.