Lots of good stuff here, thanks. I think most of this is right.
Agreed about powerful AI being prone to unpredictable rules-lawyering behavior. I touch on this a little in the post, but I think it’s really important that it’s not just the statements of the rules that determine how a deontological agent acts, but also how the relevant (moral and non-moral) concepts are operationalized, how different shapes and sizes of rule violation are weighted against each other, how risk and probability are taken into account, and so on. With all those parameters in play, we should have a high prior on getting weird and unforeseen behavior.
Also agreed that you can mitigate many of these risks if you’ve got a weak deontological agent with only a few behavior-guiding parameters and a limited palette of available actions.
My impression of the AIs value alignment literature is that it’s actually quite diverse. There are some people looking at deontological approaches using top-down rules, and some people who take moral uncertainty or pluralism seriously and think we should at least include deontology in our collection of potential moral alignment targets. (Some of @Dan H ’s work falls into that second category, e.g. this paper and this one.) In general, I think the default to utilitarianism probably isn’t as automatic among AI safety and ethics researchers as it is in LW/EA circles.
My exposure to the AI and safety ethics community’s thinking has primarily been via LW/EA and papers, so it’s entirely possible that I have a biased sample.
I had another thought on this. Existing deontological rules are intended for humans. Humans are optimizing agents, and they’re all of about the same capacity (members of a species that seems, judging by the history of stone tool development, to have been sapient for maybe a quarter million years, so possibly only just over the threshold for sapience). So there is another way in which deontological rules reduce cognitive load: generally we’re thinking about our own benefit and that of close family and friends. It’s ‘not our responsibility’ to benefit everyone in the society — all of them are already doing that, looking out for themselves. So that might well explain why standard deontological rules concentrate on avoiding harm to others, rather than doing good to others.
AGI, on the other hand, firstly may well be smarter than all the humans, possibly far smarter, so may have the capacity to do for humans things they can’t do for themselves, possibly even for a great many humans. Secondly, its ethical role is not to help itself and its friends, but to help humans: all humans. It ought to be acting selflessly. So its duty to humans isn’t just to avoid harming them and let them go about their business, but to actively help them. So I think deontological rules for an AI, if you tried to construct them, should be quite different in this respect than deontological rules for a human, and should probably focus just as much on helping as on not harming.
Lots of good stuff here, thanks. I think most of this is right.
Agreed about powerful AI being prone to unpredictable rules-lawyering behavior. I touch on this a little in the post, but I think it’s really important that it’s not just the statements of the rules that determine how a deontological agent acts, but also how the relevant (moral and non-moral) concepts are operationalized, how different shapes and sizes of rule violation are weighted against each other, how risk and probability are taken into account, and so on. With all those parameters in play, we should have a high prior on getting weird and unforeseen behavior.
Also agreed that you can mitigate many of these risks if you’ve got a weak deontological agent with only a few behavior-guiding parameters and a limited palette of available actions.
My impression of the AIs value alignment literature is that it’s actually quite diverse. There are some people looking at deontological approaches using top-down rules, and some people who take moral uncertainty or pluralism seriously and think we should at least include deontology in our collection of potential moral alignment targets. (Some of @Dan H ’s work falls into that second category, e.g. this paper and this one.) In general, I think the default to utilitarianism probably isn’t as automatic among AI safety and ethics researchers as it is in LW/EA circles.
My exposure to the AI and safety ethics community’s thinking has primarily been via LW/EA and papers, so it’s entirely possible that I have a biased sample.
I had another thought on this. Existing deontological rules are intended for humans. Humans are optimizing agents, and they’re all of about the same capacity (members of a species that seems, judging by the history of stone tool development, to have been sapient for maybe a quarter million years, so possibly only just over the threshold for sapience). So there is another way in which deontological rules reduce cognitive load: generally we’re thinking about our own benefit and that of close family and friends. It’s ‘not our responsibility’ to benefit everyone in the society — all of them are already doing that, looking out for themselves. So that might well explain why standard deontological rules concentrate on avoiding harm to others, rather than doing good to others.
AGI, on the other hand, firstly may well be smarter than all the humans, possibly far smarter, so may have the capacity to do for humans things they can’t do for themselves, possibly even for a great many humans. Secondly, its ethical role is not to help itself and its friends, but to help humans: all humans. It ought to be acting selflessly. So its duty to humans isn’t just to avoid harming them and let them go about their business, but to actively help them. So I think deontological rules for an AI, if you tried to construct them, should be quite different in this respect than deontological rules for a human, and should probably focus just as much on helping as on not harming.