I’m glad you mentioned the orthogonality hypothesis (that goals are orthogonal to intelligence). Part of our argument can be seen as reconceptualizing orthogonality, and rejecting a strong version of it.
So it’s interesting to me that you see orthogonality as leading folks to conflict theory over mistake theory. I am not sure it has to lead either way.
In our language, we say that it’s better to have an “endogenous preferences” theory than an “exogenous preferences” theory, meaning that it’s better to be able to construct models where preferences can change as a function of the mechanisms and processes being modeled. This is not true for most rational actor and RL models; they typically assume preferences at the start (i.e. “exogenously”).
Here’s a paragraph making this more concrete, which I pulled from a shorter summary of https://arxiv.org/abs/2412.19010 which I’ve been working on.
”Consider an individual’s preference for certain types of music, say, rock versus classical. We need not assume they begin with a scalar utility or ``taste″ for one over the other. Instead, in our theory, their preferences are shaped by their experiences: perhaps they grow up hearing rock music frequently at home, attend rock concerts with friends, and rarely encounter classical music. These experiences are encoded as memories. When prompted by the question “What kind of person am I?” and summarizing these memories, the individual’s pattern completion network p might generate the assembly “I am the kind of person who likes rock music”. This self-attribution, influenced by past behavior and exposure, becomes part of the global workspace context. Subsequently, when faced with a choice like “What music should I listen to?”, p, conditioned on the self-description “I like rock music”, is more likely to complete the pattern with “Listen to rock music”. Through this process, the individual’s behavior (listening to rock) is influenced by their inferred identity, which itself arose from past behavior and exposure. This example illustrates how preferences are constructed endogenously through the aggregation and consolidation of experience, rather than being static, external inputs to the model.”
Yes, I think when it comes to endogenous vs exogenous preferences, you are probably accurate about most of the field believing in exogenous preferences (although I’m not sure since I’m also unfamiliar). Rohin Shah once talked about ambiguous value learning, but my guess is that most people aren’t focusing on that direction.
It’s possible that people who believe in exogenous preferences will feel confused if they are described as following Mistake Theory, since exogenous preferences sounds like Conflict Theory, and doesn’t sound like “pursuing an objective morality.”
My personal opinion (and my opinion isn’t that important here), is that humans ourselves follow a combination of endogenous and exogenous preferences. Our morals are very strongly shaped by what everyone around us believe, often more than we realize.
But at the same time, our hardcoded biology determines how our morals get shaped. If we hunt and kill animals for food or sport, but observe the animals we kill following various norms in their animal society, we will not adopt their norms by mere exposure, and be indifferent to their norms. This is because our hardcoded biology did not encode any tendency to care about those animals or to respect their norms.
I agree that studying endogenous preferences, and how to make them go right, is valuable!
I’m glad you mentioned the orthogonality hypothesis (that goals are orthogonal to intelligence). Part of our argument can be seen as reconceptualizing orthogonality, and rejecting a strong version of it.
So it’s interesting to me that you see orthogonality as leading folks to conflict theory over mistake theory. I am not sure it has to lead either way.
In our language, we say that it’s better to have an “endogenous preferences” theory than an “exogenous preferences” theory, meaning that it’s better to be able to construct models where preferences can change as a function of the mechanisms and processes being modeled. This is not true for most rational actor and RL models; they typically assume preferences at the start (i.e. “exogenously”).
Here’s a paragraph making this more concrete, which I pulled from a shorter summary of https://arxiv.org/abs/2412.19010 which I’ve been working on.
”Consider an individual’s preference for certain types of music, say, rock versus classical. We need not assume they begin with a scalar utility or ``taste″ for one over the other. Instead, in our theory, their preferences are shaped by their experiences: perhaps they grow up hearing rock music frequently at home, attend rock concerts with friends, and rarely encounter classical music. These experiences are encoded as memories. When prompted by the question “What kind of person am I?” and summarizing these memories, the individual’s pattern completion network p might generate the assembly “I am the kind of person who likes rock music”. This self-attribution, influenced by past behavior and exposure, becomes part of the global workspace context. Subsequently, when faced with a choice like “What music should I listen to?”, p, conditioned on the self-description “I like rock music”, is more likely to complete the pattern with “Listen to rock music”. Through this process, the individual’s behavior (listening to rock) is influenced by their inferred identity, which itself arose from past behavior and exposure. This example illustrates how preferences are constructed endogenously through the aggregation and consolidation of experience, rather than being static, external inputs to the model.”
Yes, I think when it comes to endogenous vs exogenous preferences, you are probably accurate about most of the field believing in exogenous preferences (although I’m not sure since I’m also unfamiliar). Rohin Shah once talked about ambiguous value learning, but my guess is that most people aren’t focusing on that direction.
It’s possible that people who believe in exogenous preferences will feel confused if they are described as following Mistake Theory, since exogenous preferences sounds like Conflict Theory, and doesn’t sound like “pursuing an objective morality.”
My personal opinion (and my opinion isn’t that important here), is that humans ourselves follow a combination of endogenous and exogenous preferences. Our morals are very strongly shaped by what everyone around us believe, often more than we realize.
But at the same time, our hardcoded biology determines how our morals get shaped. If we hunt and kill animals for food or sport, but observe the animals we kill following various norms in their animal society, we will not adopt their norms by mere exposure, and be indifferent to their norms. This is because our hardcoded biology did not encode any tendency to care about those animals or to respect their norms.
I agree that studying endogenous preferences, and how to make them go right, is valuable!