I’m not convinced that researching the Friendly AI concept is a cost-effective way of reducing existential risk.
Researching Friendly AI is a way to reduce the risk of Unfriendly AI.
Personally, I find it very easy to outline a recipe for FAI.
You use cognitive neuroscience to figure out human values and human metaethical thought.
You automate human metaethical thought, apply that to the values determined at step 1, and thereby arrive at a human-relative moral/behavioral ideal.
You design an open-ended AGI architecture, and use the ideal from step 2 to supply it with a goal.
To me, that defines a research program that’s just full of highly concrete tasks waiting to be carried out. Some of them may be very difficult or abstract, but nothing leaves me feeling helpless, with no idea where to begin. However, you need to know something about ordinary AI (try this), and you need to have some idea of what a computational model of human decision making might look like (simple examples).
You use cognitive neuroscience to figure out human values and human metaethical thought.
Why do we need to know about meta-ethics? Why would we even think there is such a thing as human meta-ethics?
Meta-ethics may be relevant for crafting ways to approach the problem. But I don’t see how it is a descriptive project answerable with the techniques of cognitive neuroscience the way you describe. I also don’t see why an AI would need to know whether we were cognitivists or non-cognitivists to know how we want it to act.
I can think of a way it could matter. If we’re non-cognitivists, we think our moral discourse consists of speech acts, so when we argue over morality we’re actually issuing imperatives, or declaring universal imperatives exist. If we’re cognitivists, we’re talking about what is the case. There could very well be grounds for a FAI to treat an imperative and a proposition differently. Doesn’t mean the distinction actually matters, but it’s not obviously irrelevant.
So it would be sloppy to code what you want an AI to do the way you code propositions/beliefs. That is, you don’t want to fit the bulk of the goal architecture inside it’s belief networks. Nor certainly, should you expect the AI to learn moral truths by looking at the world. Once you tell it to care about what people want, then it can look at people to find that out- but it can’t learn to care about what people want just by observing the world. Those kind of moral facts don’t exist. So certainly knowing things about meta-ethics will help create an FAI.
But that’s an argument for smart people to spend time thinking about meta-ethics. It’s not an argument for a descriptive program that finds folk-metaethics to form the the goal architecture of an AI. For one thing, most humans seem to have really confused meta-ethical beliefs.
On reflection, I think by ‘metaethical thought’ Mitchell probably meant the normative theory that describes human ethics. I don’t think there is one of those either, but it’s not obviously wrong and certainly makes more sense.
I meant: innate cognitive architecture which plays a role in metaethical thought.
You might be familiar with the idea that, according to CEV, you figure out the full complexity of human value using neuroscience (rather than relying on people’s opinions about what they value), and then you “extrapolate” or “renormalize” that using “reflective decision theory” (which does not yet exist). The idea here is that the method of extrapolation should also be extracted from the details of human cognitive architecture, rather than just figured out through intuition or pure reason.
Suppose we have a person—or an intelligent agent—with a particular “value system” or “private decision theory”. Note that we are talking about its actual decision theory, as embodied in its causal structure and decision-making dispositions, and not just its introspective opinions about how it decides. Given this actual value system, RDT is supposed to tell us what would happen to that value system if it were changed according to its own implicit ideals. All I’m saying is that there’s a meta-ethical relativism for RDT, for large classes of decision architecture. Different theories about how to normatively self-modify a decision architecture ought to be possible, and the selection of which RDT is used should also be derived from the agent’s own cognitive architecture.
Of course you can go meta again and say, maybe the RDT extraction procedure can also take different forms—etc. It’s one of the tasks of the FAI/CEV/RDT research program to figure out when and how the ethical metalevels stop.
To me, that defines a research program that’s just full of highly concrete tasks waiting to be carried out. Some of them may be very difficult or abstract, but nothing leaves me feeling helpless, with no idea where to begin.
I imagine that writing up a description of many such concrete tasks would help clarify the situation; I’d encourage you to do so if you have some time.
However, you need to know something about ordinary AI (try this), and you need to have some idea of what a computational model of human decision making might look like (simple examples).
Yes, I don’t have subject matter knowledge. Thanks for the links.
Researching Friendly AI is a way to reduce the risk of Unfriendly AI.
Personally, I find it very easy to outline a recipe for FAI.
You use cognitive neuroscience to figure out human values and human metaethical thought.
You automate human metaethical thought, apply that to the values determined at step 1, and thereby arrive at a human-relative moral/behavioral ideal.
You design an open-ended AGI architecture, and use the ideal from step 2 to supply it with a goal.
To me, that defines a research program that’s just full of highly concrete tasks waiting to be carried out. Some of them may be very difficult or abstract, but nothing leaves me feeling helpless, with no idea where to begin. However, you need to know something about ordinary AI (try this), and you need to have some idea of what a computational model of human decision making might look like (simple examples).
Why do we need to know about meta-ethics? Why would we even think there is such a thing as human meta-ethics?
Meta-ethics may be relevant for crafting ways to approach the problem. But I don’t see how it is a descriptive project answerable with the techniques of cognitive neuroscience the way you describe. I also don’t see why an AI would need to know whether we were cognitivists or non-cognitivists to know how we want it to act.
I can think of a way it could matter. If we’re non-cognitivists, we think our moral discourse consists of speech acts, so when we argue over morality we’re actually issuing imperatives, or declaring universal imperatives exist. If we’re cognitivists, we’re talking about what is the case. There could very well be grounds for a FAI to treat an imperative and a proposition differently. Doesn’t mean the distinction actually matters, but it’s not obviously irrelevant.
So it would be sloppy to code what you want an AI to do the way you code propositions/beliefs. That is, you don’t want to fit the bulk of the goal architecture inside it’s belief networks. Nor certainly, should you expect the AI to learn moral truths by looking at the world. Once you tell it to care about what people want, then it can look at people to find that out- but it can’t learn to care about what people want just by observing the world. Those kind of moral facts don’t exist. So certainly knowing things about meta-ethics will help create an FAI.
But that’s an argument for smart people to spend time thinking about meta-ethics. It’s not an argument for a descriptive program that finds folk-metaethics to form the the goal architecture of an AI. For one thing, most humans seem to have really confused meta-ethical beliefs.
On reflection, I think by ‘metaethical thought’ Mitchell probably meant the normative theory that describes human ethics. I don’t think there is one of those either, but it’s not obviously wrong and certainly makes more sense.
I meant: innate cognitive architecture which plays a role in metaethical thought.
You might be familiar with the idea that, according to CEV, you figure out the full complexity of human value using neuroscience (rather than relying on people’s opinions about what they value), and then you “extrapolate” or “renormalize” that using “reflective decision theory” (which does not yet exist). The idea here is that the method of extrapolation should also be extracted from the details of human cognitive architecture, rather than just figured out through intuition or pure reason.
Suppose we have a person—or an intelligent agent—with a particular “value system” or “private decision theory”. Note that we are talking about its actual decision theory, as embodied in its causal structure and decision-making dispositions, and not just its introspective opinions about how it decides. Given this actual value system, RDT is supposed to tell us what would happen to that value system if it were changed according to its own implicit ideals. All I’m saying is that there’s a meta-ethical relativism for RDT, for large classes of decision architecture. Different theories about how to normatively self-modify a decision architecture ought to be possible, and the selection of which RDT is used should also be derived from the agent’s own cognitive architecture.
Of course you can go meta again and say, maybe the RDT extraction procedure can also take different forms—etc. It’s one of the tasks of the FAI/CEV/RDT research program to figure out when and how the ethical metalevels stop.
Thanks for your interesting comment.
I imagine that writing up a description of many such concrete tasks would help clarify the situation; I’d encourage you to do so if you have some time.
Yes, I don’t have subject matter knowledge. Thanks for the links.