I’m not commenting anything about positive suggestions in this post, but it has some really strange understanding of what it argues with.
an ‘Axiom of Rational Convergence’. This is the powerful idea that under sufficiently ideal epistemic conditions – ample time, information, reasoning ability, freedom from bias or coercion – rational agents will ultimately converge on a single, correct set of beliefs, values, or plans, effectively identifying ‘the truth’.
There is no reason to “agree on values” (in a sense of actively seeking agreement instead of observing it if it is here) and I don’t think many rationalists would argue with that? Values are free parameters in rational behavior, they can be arbitrary. Different humans have different values and this is fine. CEV is about acting where human values are coherent, not about imposing coherence out of nowhere.
Right, so this is the part we reject. In the long theory of appropriateness paper, instead of having a model with exogenous preferences (that’s the term we use for the assumption that values are free parameters in rational behavior), we say that it’s better to have a theory where preferences are endogenous so they can change as a function of the social mechanisms being modeled.
So, in our theory, personal values are caused by social conventions, norms, and institutions.
Combine this with the contrast between thick and thin morality, which we did mention in the post. You get the conclusion that it’s very difficult for individuals to tell which of their personal values are part of ‘thin’ morality that applies cross culturally versus which are just part of their own culture’s ‘thick’ morality. Another way of saying this is, we’re surrounded by moral rules and morally-laden preferences and it’s very difficult to tell from the inside of any given culture which of those rules are important versus which of them are silly. From the inside perspective they look exactly the same. Transgressions are punished in the same way, etc.
Since we the AI safety community are ourselves inside a particular culture, when we talk about CEV as being “about acting where human values are coherent”, we still mean that with an implicit “as measured and understood by us, here and now”. But, from the perspective of someone in a different culture, that makes it indistinguishable from “imposing coherence out of nowhere”.
You could reply that I’m talking about practical operationalizations of CEV, not the abstract concept of CEV itself. And OK, sure, fair enough. But the abstract concept doesn’t do anything on its own. You always have to operationalize it in some practical way. And all practical operationalizations will have this problem.
I’d modify “So, in our theory, personal values are caused by social conventions, norms, and institutions.”
To the more accurate:
“So, in our theory, personal values are caused by biology (interactions of genetics with developmental environment), social conventions, norms, and institutions.”
Fetal Alcohol Syndrome resulting in someone being unusually ill-tempered doesn’t mean that the society the person is raised in has norms for “people with FAS should be more angry by default than typical people”.
Well that fetal alcohol syndrome is associated with poor emotional regulation is apparently a thing that is true in the sense that you can measure it, and it seems to be helpful to know it if you’re treating patients with fetal alcohol syndrome or living with them.
But our theory is formal, it’s not just a collection of true statements. It’s not really clear how we could use it to model the effect of fetal alcohol syndrome on emotional regulation. So, there’s a sense in which we don’t capture that “feature of reality”. But how important is it to capture? All theories are wrong in some ways. The only way to judge them is in terms of whether they are useful. In this context, we are arguing that our theory of appropriateness is a useful improvement on the rational actor theory, which by the way, also has trouble accounting for the effect of fetal alcohol syndrome on emotional regulation. So if you need a theory of human behavior that can accommodate it, you probably shouldn’t use either of these.
I’m not commenting anything about positive suggestions in this post, but it has some really strange understanding of what it argues with.
There is no reason to “agree on values” (in a sense of actively seeking agreement instead of observing it if it is here) and I don’t think many rationalists would argue with that? Values are free parameters in rational behavior, they can be arbitrary. Different humans have different values and this is fine. CEV is about acting where human values are coherent, not about imposing coherence out of nowhere.
Right, so this is the part we reject. In the long theory of appropriateness paper, instead of having a model with exogenous preferences (that’s the term we use for the assumption that values are free parameters in rational behavior), we say that it’s better to have a theory where preferences are endogenous so they can change as a function of the social mechanisms being modeled.
So, in our theory, personal values are caused by social conventions, norms, and institutions.
Combine this with the contrast between thick and thin morality, which we did mention in the post. You get the conclusion that it’s very difficult for individuals to tell which of their personal values are part of ‘thin’ morality that applies cross culturally versus which are just part of their own culture’s ‘thick’ morality. Another way of saying this is, we’re surrounded by moral rules and morally-laden preferences and it’s very difficult to tell from the inside of any given culture which of those rules are important versus which of them are silly. From the inside perspective they look exactly the same. Transgressions are punished in the same way, etc.
Since we the AI safety community are ourselves inside a particular culture, when we talk about CEV as being “about acting where human values are coherent”, we still mean that with an implicit “as measured and understood by us, here and now”. But, from the perspective of someone in a different culture, that makes it indistinguishable from “imposing coherence out of nowhere”.
You could reply that I’m talking about practical operationalizations of CEV, not the abstract concept of CEV itself. And OK, sure, fair enough. But the abstract concept doesn’t do anything on its own. You always have to operationalize it in some practical way. And all practical operationalizations will have this problem.
I’d modify “So, in our theory, personal values are caused by social conventions, norms, and institutions.”
To the more accurate:
“So, in our theory, personal values are caused by biology (interactions of genetics with developmental environment), social conventions, norms, and institutions.”
Fetal Alcohol Syndrome resulting in someone being unusually ill-tempered doesn’t mean that the society the person is raised in has norms for “people with FAS should be more angry by default than typical people”.
Well that fetal alcohol syndrome is associated with poor emotional regulation is apparently a thing that is true in the sense that you can measure it, and it seems to be helpful to know it if you’re treating patients with fetal alcohol syndrome or living with them.
But our theory is formal, it’s not just a collection of true statements. It’s not really clear how we could use it to model the effect of fetal alcohol syndrome on emotional regulation. So, there’s a sense in which we don’t capture that “feature of reality”. But how important is it to capture? All theories are wrong in some ways. The only way to judge them is in terms of whether they are useful. In this context, we are arguing that our theory of appropriateness is a useful improvement on the rational actor theory, which by the way, also has trouble accounting for the effect of fetal alcohol syndrome on emotional regulation. So if you need a theory of human behavior that can accommodate it, you probably shouldn’t use either of these.