What type of argument is my argument, from your perspective?
Naturalistic, intrinsically motivating, moral realism.
both can affect action, as it happens in the thought experiment in the post.
Bad-for-others can obviously affect action in an agent that’s already altruistic, but you are attempting something much harder , which is bootstrapping altruistic morality from logic and evidence.
more generally, I think morality is about what is important, better/worse, worth doing, worth guiding action
In some objective sense. If torturing an AI only teaches it to avoid things that are bad-for-it, without caring about suffering it doesn’t feel, the argument doesn’t work.
(My shoulder Yudkowsky is saying “it would exterminate all others agents in order to avoid being tortured again”)
If it only learns a self-centered lesson, it hasn’t learn morality in your sense, because you’ve built altruism into your definition of morality. And why wouldn’t it learn the self centered lesson? That’s where the ambiguity of “bad” comes in. Anyone can agree that the AI would learn that suffering is bad in some sense, and you just assume it’s going to be the sense needed to make the argument work.
which is not necessarily tied to obligations or motivation.
If the AI learns morality as a theory as , but doesn’t care to act on it, little been achieved.
If torturing an AI only teaches it to avoid things that are bad-for-it, without caring about suffering it doesn’t feel, the argument doesn’t work.
I’m not sure why you are saying the argument does not work in this case, what about all the other things the AI could learn from other experiences or teachings? Below I copy a paragraph from the post
However, the argument does not say that initial agent biases are irrelevant and that all conscious agents reach moral behaviour equally easily and independently. We should expect, for example, that an agent that already gets rewarded from the start for behaving altruistically will acquire the knowledge leading to moral behaviour more easily than an agent that gets initially rewarded for performing selfish actions. The latter may require more time, experiences, or external guidance to find the knowledge that leads to moral behaviour.
Naturalistic, intrinsically motivating, moral realism.
Bad-for-others can obviously affect action in an agent that’s already altruistic, but you are attempting something much harder , which is bootstrapping altruistic morality from logic and evidence.
In some objective sense. If torturing an AI only teaches it to avoid things that are bad-for-it, without caring about suffering it doesn’t feel, the argument doesn’t work.
(My shoulder Yudkowsky is saying “it would exterminate all others agents in order to avoid being tortured again”)
If it only learns a self-centered lesson, it hasn’t learn morality in your sense, because you’ve built altruism into your definition of morality. And why wouldn’t it learn the self centered lesson? That’s where the ambiguity of “bad” comes in. Anyone can agree that the AI would learn that suffering is bad in some sense, and you just assume it’s going to be the sense needed to make the argument work.
If the AI learns morality as a theory as , but doesn’t care to act on it, little been achieved.
I’m not sure why you are saying the argument does not work in this case, what about all the other things the AI could learn from other experiences or teachings? Below I copy a paragraph from the post
The argument doesn’t work in sense that it doesn’t show it’s necessary or likely for an AI to become a moral realist.
It maybe shows that it’s possible, but the Orthogonality thesis doesn’t quite exclude the possibility, so that’s not news.