A descriptive model of moral change, virtue signaling, and cause allocation that I thought of in part as response to Paul Christiano’s Moral public goods
. (It was previously posted deep in a subthread and I’m reposting it here to get more attention and feedback before possibly writing it up as a top-level post.)
People are socially rewarded for exhibiting above average virtue/morality (for certain kinds of virtue/morality that are highly regarded in their local community) and punished for exhibiting below average virtue/morality.
As a result, we evolved two internal mechanisms: preference alteration (my own phrase) and preference falsification. Preference alteration is where someone’s preferences actually change according to the social reward gradient, and preference falsification is acting in public according to the reward gradient but not doing so in private. The amounts of preference alteration and preference falsification can vary between individuals. (We have preference alteration because preference falsification is cognitively costly, and we have preference falsification because preference alteration is costly in terms of physical resources.)
Preference alteration changes one’s internal “moral factions” to better match what is being rewarded. That is, the factions representing virtues/morals being socially rewarded get a higher budget.
For example there is a faction representing “family values” (being altruistic to one’s family), one for local altruism, one for national altruism, one for global altruism, one for longtermism, etc., and #3 is mainly how one ends up allocating resources between these factions, including how much to donate to various causes and how to weigh considerations for voting.
In particular, public goods and other economics considerations do not really apply (at least not directly), as far as relative spending across different moral factions, because resources are allocated through budgets rather than weights in a utility function. For example if global anti-poverty suddenly becomes much more cost effective, one doesn’t vote or donate to spend more on global poverty, because the budget allocated to that faction hasn’t changed. Similarly, larger countries do not have higher ODA as the public goods model predicts. Instead the allocation is mostly determined by how much global altruism is rewarded by one’s local community, which differs across communities due to historical contingencies, what kinds of people make up the community, etc.
On top of this, preference falsification makes one act in public (through, e.g., public advocacy, publicly visible donations, conspicuously punishing those who fall short of the local norms) more like someone who fully subscribes to the virtues/morals being socially rewarded, even if one’s preference alteration falls short of that. [Added: This is probably responsible for purity spirals / runaway virtue signaling. E.g., people overcompensate for private deviations from moral norms by putting lots of effort into public signaling including punishing norm violators and non-punishers, causing even more preference alteration and falsification by others.]
This system probably evolved to “solve” local problems like local public goods and fairness within the local community, but has been co-opted by larger-scale moral memeplexes.
“Rhetoric about doing your part” is how communities communicate what the local norms are, in order to trigger preference alteration. “Feelings of guilt” is what preference alteration feels like from the inside. [Added: This is referring to some things Paul said earlier in that subthread.]
tl;dr: seems like you need some story for what values a group highly regards / rewards. If those are just the values that serve the group, this doesn’t sound very distinct from “groups try to enforce norms which benefit the group, e.g. public goods provision” + “those norms are partially successful, though people additionally misrepresent the extent to which they e.g. contribute to public goods.”
Similarly, larger countries do not have higher ODA as the public goods model predicts
Calling this the “public goods model” still seems backwards. “Larger countries have higher ODA” is a prediction of “the point of ODA is to satisfy the donor’s consequentialist altruistic preferences.”
The “public goods model” is an attempt to model the kind of moral norms / rhetoric / pressures / etc. that seem non-consequentialist. It suggests that such norms function in part to coordinate the provision of public goods, rather than as a direct expression of individual altruistic preferences. (Individual altruistic preferences will sometimes be why something is a public good.)
This system probably evolved to “solve” local problems like local public goods and fairness within the local community, but has been co-opted by larger-scale moral memeplexes.
I agree that there are likely to be failures of this system (viewed teleologically as a mechanism for public goods provision or conflict resolution) and that “moral norms are reliably oriented towards provide public goods” is less good than “moral norms are vaguely oriented towards providing public goods.” Overall the situation seems similar to a teleological view of humans.
For example if global anti-poverty suddenly becomes much more cost effective, one doesn’t vote or donate to spend more on global poverty, because the budget allocated to that faction hasn’t changed.
I agree with this, but it seems orthogonal to the “public goods model,” this is just about how people or groups aggregate across different values. I think it’s pretty obvious in the case of imperfectly-coordinated groups (who can’t make commitments to have their resource shares change as beliefs about relative efficacy change), and I think it also seems right in the case of imperfectly-internally-coordinated people.
(We have preference alteration because preference falsification is cognitively costly, and we have preference falsification because preference alteration is costly in terms of physical resources.)
E.g., people overcompensate for private deviations from moral norms by putting lots of effort into public signaling including punishing norm violators and non-punishers, causing even more preference alteration and falsification by others.
I don’t immediately see why this would be “compensation,” it seems like public signaling of virtue would always be a good idea regardless of your private behavior. Indeed, it probably becomes a better idea as your private behavior is more virtuous (in economics you’d only call the behavior “signaling” to the extent that this is true).
As a general point, I think calling this “signaling” is kind of misleading. For example, when I follow the law, in part I’m “signaling” that I’m law-abiding, but to a significant extent I’m also just responding to incentives to follow the law which are imposed because other people want me to follow the law. That kind of thing is not normally called signaling. I think many of the places you are currently saying “virtue signaling” have significant non-signaling components.
I don’t immediately see why this would be “compensation,” it seems like public signaling of virtue would always be a good idea regardless of your private behavior.
I didn’t have a clear model in mind when I wrote that, and just wrote down “overcompensate” by intuition. Thinking more about it, I think a model that makes sense here is to assume that your private actions can be audited by others at some cost (think of Red Guards going into people’s homes to look for books, diaries, assets, etc., to root out “counter-revolutionaries”), so if you have something to hide you’d want to avoid getting audited by avoiding suspicion, and one way to do that is to put extra effort into public displays of virtue. People whose private actions are virtuous would not have this extra incentive.
As a general point, I think calling this “signaling” is kind of misleading.
I guess I’ve been using “virtue signaling” because it’s an established term that seems to be referring to the same kind of behavior that I’m talking about. But I acknowledge that the way I’m modeling it doesn’t really match the concept of “signaling” from economics, and I’m open to suggestions for a better term. (I’ll also just think about how to reword my text to avoid this confusion.)
If those are just the values that serve the group, this doesn’t sound very distinct from “groups try to enforce norms which benefit the group, e.g. public goods provision” + “those norms are partially successful, though people additionally misrepresent the extent to which they e.g. contribute to public goods.”
It’s entirely possible that I misunderstood or missed some of the points of your Moral public goods post and then reinvented the same ideas you were trying to convey. By “public goods model” I meant something like “where we see low levels of redistribution and not much coordination over redistribution, that is best explained by people preferring a world with higher level of redistribution but failing to coordinate, instead of by people just not caring about others.” I was getting this by generalizing from your opening example:
The nobles are altruistic enough that they prefer it if everyone gives to the peasants, but it’s still not worth it for any given noble to contribute anything to the collective project.
Your sections 1 and 2 also seemed to be talking about this. So this is what my “alternative model” was in reaction to. The “alternative model” says that where we see low levels of redistribution (to some target class), it’s because people don’t care much about the target class of redistribution and assign the relevant internal moral faction a small budget, and this is mostly because caring about the target class is not socially rewarded.
Your section 3 may be saying something similar to what I’m saying, but I have to admit I don’t really understand it (perhaps I should have tried to get clarification earlier but I thought I understood what the rest of the post was saying and could just respond to that). Do you think you were trying to make any points that have not been reinvented/incorporated into my model? If so please explain what they were, or perhaps do a more detailed breakdown of your preferred model, in a way that would be easier to compare with my “alternative model”?
seems like you need some story for what values a group highly regards / rewards
I think it depends on a lot of things so it’s hard to give a full story, but if we consider for example the question of “why is concern about ‘social justice’ across identity groups currently so much more highly regarded/rewarded than concerns about ‘social justice’ across social classes” the answer seems to be that a certain moral memeplex happened to be popular in some part of academia and then spread from there due to being “at the right place at the right time” to take over from other decaying moral memeplexes like religion, communism, and liberalism. (ETA: This isn’t necessarily the right explanation, my point is just that it seems necessary to give an explanation that is highly historically contingent.)
(I’ll probably respond to the rest of your comment after I get clarification on the above.)
I don’t think that it’s just social justice across identity groups being at the right place at the right time. As a meme it has the advantage that it allows people who are already powerful enough to effect social structures to argue why they should have more power. That’s a lot harder for social justice across social classes.
We have preference alteration because preference falsification is cognitively costly
This seems incomplete; if I hold money in different currencies, it seems right for me to adopt ‘market rates’ for conversion between them, which seems like preference alteration. But the root cause isn’t that it’d be cognitively costly for me to keep a private ledger of how I want to exchange between pounds and yen and dollars and a separate public ledger, it’s that I was only ever using pounds and yen and dollars as an investment vehicle.
It seems quite possible that similar things are true for preferences / time use / whatever; someone who follows TV shows so that they have something to talk about with their coworkers is going to just follow whatever shows their coworkers are interested in, because they’re just using it as an investment vehicle instead of something to be pursued in its own right.
Preference alteration changes one’s internal “moral factions” to better match what is being rewarded. That is, the factions representing virtues/morals being socially rewarded get a higher budget.
It also seems like the factions changing directions is quite important here; you might not change the total budget spent on global altruism at all while taking totally different actions (i.e. donating to different charities).
Sorry for the delayed reply, but I was confused by your comment and have been trying to figure out how to respond. Still not sure I understand but I’m going to take a shot.
someone who follows TV shows so that they have something to talk about with their coworkers is going to just follow whatever shows their coworkers are interested in, because they’re just using it as an investment vehicle instead of something to be pursued in its own right.
Watching a TV show in order to talk about it with coworkers is an instance of instrumental preferences (which I didn’t talk about specifically in my model but was implicitly assuming as a background concept). When I wrote “preference alteration” I was referring to terminal preferences/values. So if you switch what show you watch in order to match your coworkers’ interests (and would stop as soon as that instrumental value went away), that’s not covered by either “preference alteration” or “preference falsification”, but just standard instrumental preferences. However if you’re also claiming to like the show when you don’t, in order to fit in, then that would be covered under “preference falsification”.
Does this indicate a correct understanding of your comment, and does it address your point? If so, it doesn’t seem like the model is missing anything (“incomplete”), except I could perhaps add an explicit explanation of instrumental preferences and clarify that “preference alteration” is talking about terminal preferences. Do you agree?
It also seems like the factions changing directions is quite important here; you might not change the total budget spent on global altruism at all while taking totally different actions (i.e. donating to different charities).
Sure, this is totally compatible with my model and I didn’t intend to suggest otherwise.
Does this indicate a correct understanding of your comment, and does it address your point?
I think the core thing going on with my comment is that I think for humans most mentally accessible preferences are instrumental, and the right analogy for them is something like ‘value functions’ instead of ‘reward’ (as in RL).
Under this view, preference alteration is part of normal operation, and so should probably be cast as a special case of the general thing, instead of existing only in this context. When someone who initially dislikes the smell of coffee grows to like it, I don’t think it’s directly because it’s cognitively costly to keep two books, and instead it’s because they have some anticipation-generating machinery that goes from anticipating bad things about coffee to anticipating good things about coffee.
[It is indirectly about cognitive costs, in that if it were free you might store all your judgments ever, but from a functional perspective downweighting obsolete beliefs isn’t that different from forgetting them.]
And so it seems like there are three cases worth considering: given a norm that people should root for the sports team where they grew up, I can either 1) privately prefer Other team while publicly rooting for Local team, 2) publicly prefer Local team in order to not have to lie to myself, or 3) publicly prefer Local team for some other reason. (Maybe I trust the thing that generated the norm is wiser than I am, or whatever.)
Maybe another way to think about this how the agent relates to the social reward gradient; if it’s just a fact of the environment, then it makes sense to learn about it the way you would learn about coffee, whereas if it’s another agent influencing you as you influence it, then it makes sense to keep separate books, and only not do so when the expected costs outweigh the expected rewards.
I think for humans most mentally accessible preferences are instrumental, and the right analogy for them is something like ‘value functions’ instead of ‘reward’ (as in RL).
I agree. As far as I can tell, people seem to be predicting their on-policy Q function when considering different choices. See also attainable utility theory and the gears of impact.
[The following is a musing that might or might not be adding anything.]
As a result, we evolved two internal mechanisms: preference alteration (my own phrase) and preference falsification. Preference alteration is where someone’s preferences actually change according to the social reward gradient, and preference falsification is acting in public according to the reward gradient but not doing so in private. The amounts of preference alteration and preference falsification can vary between individuals. (We have preference alteration because preference falsification is cognitively costly, and we have preference falsification because preference alteration is costly in terms of physical resources.)
One thing that comes to mind here is framing myself as a mesa-optimizer in a (social) training process. Insofar as the training process worked, and I was successfully aligned, my values are the values of the social gradient. Or the I might be an unaligned optimizer intending to execute a treacherous turn (though in this context, the “treacherous turn” is not a discreet moment when I change my actions, but rather a continual back-and-forth between serving selfish interests and serving the social morality, depending on the circumstances).
“Feelings of guilt” is what preference alteration feels like from the inside.
I’m not sure that that is always what it feels like. I can feel pride at my moral execution.
A descriptive model of moral change, virtue signaling, and cause allocation that I thought of in part as response to Paul Christiano’s Moral public goods . (It was previously posted deep in a subthread and I’m reposting it here to get more attention and feedback before possibly writing it up as a top-level post.)
People are socially rewarded for exhibiting above average virtue/morality (for certain kinds of virtue/morality that are highly regarded in their local community) and punished for exhibiting below average virtue/morality.
As a result, we evolved two internal mechanisms: preference alteration (my own phrase) and preference falsification. Preference alteration is where someone’s preferences actually change according to the social reward gradient, and preference falsification is acting in public according to the reward gradient but not doing so in private. The amounts of preference alteration and preference falsification can vary between individuals. (We have preference alteration because preference falsification is cognitively costly, and we have preference falsification because preference alteration is costly in terms of physical resources.)
Preference alteration changes one’s internal “moral factions” to better match what is being rewarded. That is, the factions representing virtues/morals being socially rewarded get a higher budget.
For example there is a faction representing “family values” (being altruistic to one’s family), one for local altruism, one for national altruism, one for global altruism, one for longtermism, etc., and #3 is mainly how one ends up allocating resources between these factions, including how much to donate to various causes and how to weigh considerations for voting.
In particular, public goods and other economics considerations do not really apply (at least not directly), as far as relative spending across different moral factions, because resources are allocated through budgets rather than weights in a utility function. For example if global anti-poverty suddenly becomes much more cost effective, one doesn’t vote or donate to spend more on global poverty, because the budget allocated to that faction hasn’t changed. Similarly, larger countries do not have higher ODA as the public goods model predicts. Instead the allocation is mostly determined by how much global altruism is rewarded by one’s local community, which differs across communities due to historical contingencies, what kinds of people make up the community, etc.
On top of this, preference falsification makes one act in public (through, e.g., public advocacy, publicly visible donations, conspicuously punishing those who fall short of the local norms) more like someone who fully subscribes to the virtues/morals being socially rewarded, even if one’s preference alteration falls short of that. [Added: This is probably responsible for purity spirals / runaway virtue signaling. E.g., people overcompensate for private deviations from moral norms by putting lots of effort into public signaling including punishing norm violators and non-punishers, causing even more preference alteration and falsification by others.]
This system probably evolved to “solve” local problems like local public goods and fairness within the local community, but has been co-opted by larger-scale moral memeplexes.
“Rhetoric about doing your part” is how communities communicate what the local norms are, in order to trigger preference alteration. “Feelings of guilt” is what preference alteration feels like from the inside. [Added: This is referring to some things Paul said earlier in that subthread.]
tl;dr: seems like you need some story for what values a group highly regards / rewards. If those are just the values that serve the group, this doesn’t sound very distinct from “groups try to enforce norms which benefit the group, e.g. public goods provision” + “those norms are partially successful, though people additionally misrepresent the extent to which they e.g. contribute to public goods.”
Calling this the “public goods model” still seems backwards. “Larger countries have higher ODA” is a prediction of “the point of ODA is to satisfy the donor’s consequentialist altruistic preferences.”
The “public goods model” is an attempt to model the kind of moral norms / rhetoric / pressures / etc. that seem non-consequentialist. It suggests that such norms function in part to coordinate the provision of public goods, rather than as a direct expression of individual altruistic preferences. (Individual altruistic preferences will sometimes be why something is a public good.)
I agree that there are likely to be failures of this system (viewed teleologically as a mechanism for public goods provision or conflict resolution) and that “moral norms are reliably oriented towards provide public goods” is less good than “moral norms are vaguely oriented towards providing public goods.” Overall the situation seems similar to a teleological view of humans.
I agree with this, but it seems orthogonal to the “public goods model,” this is just about how people or groups aggregate across different values. I think it’s pretty obvious in the case of imperfectly-coordinated groups (who can’t make commitments to have their resource shares change as beliefs about relative efficacy change), and I think it also seems right in the case of imperfectly-internally-coordinated people.
Relevant links: if we can’t lie to others, we will lie to ourselves, the monkey and the machine.
I don’t immediately see why this would be “compensation,” it seems like public signaling of virtue would always be a good idea regardless of your private behavior. Indeed, it probably becomes a better idea as your private behavior is more virtuous (in economics you’d only call the behavior “signaling” to the extent that this is true).
As a general point, I think calling this “signaling” is kind of misleading. For example, when I follow the law, in part I’m “signaling” that I’m law-abiding, but to a significant extent I’m also just responding to incentives to follow the law which are imposed because other people want me to follow the law. That kind of thing is not normally called signaling. I think many of the places you are currently saying “virtue signaling” have significant non-signaling components.
I didn’t have a clear model in mind when I wrote that, and just wrote down “overcompensate” by intuition. Thinking more about it, I think a model that makes sense here is to assume that your private actions can be audited by others at some cost (think of Red Guards going into people’s homes to look for books, diaries, assets, etc., to root out “counter-revolutionaries”), so if you have something to hide you’d want to avoid getting audited by avoiding suspicion, and one way to do that is to put extra effort into public displays of virtue. People whose private actions are virtuous would not have this extra incentive.
I guess I’ve been using “virtue signaling” because it’s an established term that seems to be referring to the same kind of behavior that I’m talking about. But I acknowledge that the way I’m modeling it doesn’t really match the concept of “signaling” from economics, and I’m open to suggestions for a better term. (I’ll also just think about how to reword my text to avoid this confusion.)
It’s entirely possible that I misunderstood or missed some of the points of your Moral public goods post and then reinvented the same ideas you were trying to convey. By “public goods model” I meant something like “where we see low levels of redistribution and not much coordination over redistribution, that is best explained by people preferring a world with higher level of redistribution but failing to coordinate, instead of by people just not caring about others.” I was getting this by generalizing from your opening example:
Your sections 1 and 2 also seemed to be talking about this. So this is what my “alternative model” was in reaction to. The “alternative model” says that where we see low levels of redistribution (to some target class), it’s because people don’t care much about the target class of redistribution and assign the relevant internal moral faction a small budget, and this is mostly because caring about the target class is not socially rewarded.
Your section 3 may be saying something similar to what I’m saying, but I have to admit I don’t really understand it (perhaps I should have tried to get clarification earlier but I thought I understood what the rest of the post was saying and could just respond to that). Do you think you were trying to make any points that have not been reinvented/incorporated into my model? If so please explain what they were, or perhaps do a more detailed breakdown of your preferred model, in a way that would be easier to compare with my “alternative model”?
I think it depends on a lot of things so it’s hard to give a full story, but if we consider for example the question of “why is concern about ‘social justice’ across identity groups currently so much more highly regarded/rewarded than concerns about ‘social justice’ across social classes” the answer seems to be that a certain moral memeplex happened to be popular in some part of academia and then spread from there due to being “at the right place at the right time” to take over from other decaying moral memeplexes like religion, communism, and liberalism. (ETA: This isn’t necessarily the right explanation, my point is just that it seems necessary to give an explanation that is highly historically contingent.)
(I’ll probably respond to the rest of your comment after I get clarification on the above.)
I don’t think that it’s just social justice across identity groups being at the right place at the right time. As a meme it has the advantage that it allows people who are already powerful enough to effect social structures to argue why they should have more power. That’s a lot harder for social justice across social classes.
This seems incomplete; if I hold money in different currencies, it seems right for me to adopt ‘market rates’ for conversion between them, which seems like preference alteration. But the root cause isn’t that it’d be cognitively costly for me to keep a private ledger of how I want to exchange between pounds and yen and dollars and a separate public ledger, it’s that I was only ever using pounds and yen and dollars as an investment vehicle.
It seems quite possible that similar things are true for preferences / time use / whatever; someone who follows TV shows so that they have something to talk about with their coworkers is going to just follow whatever shows their coworkers are interested in, because they’re just using it as an investment vehicle instead of something to be pursued in its own right.
It also seems like the factions changing directions is quite important here; you might not change the total budget spent on global altruism at all while taking totally different actions (i.e. donating to different charities).
Sorry for the delayed reply, but I was confused by your comment and have been trying to figure out how to respond. Still not sure I understand but I’m going to take a shot.
Watching a TV show in order to talk about it with coworkers is an instance of instrumental preferences (which I didn’t talk about specifically in my model but was implicitly assuming as a background concept). When I wrote “preference alteration” I was referring to terminal preferences/values. So if you switch what show you watch in order to match your coworkers’ interests (and would stop as soon as that instrumental value went away), that’s not covered by either “preference alteration” or “preference falsification”, but just standard instrumental preferences. However if you’re also claiming to like the show when you don’t, in order to fit in, then that would be covered under “preference falsification”.
Does this indicate a correct understanding of your comment, and does it address your point? If so, it doesn’t seem like the model is missing anything (“incomplete”), except I could perhaps add an explicit explanation of instrumental preferences and clarify that “preference alteration” is talking about terminal preferences. Do you agree?
Sure, this is totally compatible with my model and I didn’t intend to suggest otherwise.
I think the core thing going on with my comment is that I think for humans most mentally accessible preferences are instrumental, and the right analogy for them is something like ‘value functions’ instead of ‘reward’ (as in RL).
Under this view, preference alteration is part of normal operation, and so should probably be cast as a special case of the general thing, instead of existing only in this context. When someone who initially dislikes the smell of coffee grows to like it, I don’t think it’s directly because it’s cognitively costly to keep two books, and instead it’s because they have some anticipation-generating machinery that goes from anticipating bad things about coffee to anticipating good things about coffee.
[It is indirectly about cognitive costs, in that if it were free you might store all your judgments ever, but from a functional perspective downweighting obsolete beliefs isn’t that different from forgetting them.]
And so it seems like there are three cases worth considering: given a norm that people should root for the sports team where they grew up, I can either 1) privately prefer Other team while publicly rooting for Local team, 2) publicly prefer Local team in order to not have to lie to myself, or 3) publicly prefer Local team for some other reason. (Maybe I trust the thing that generated the norm is wiser than I am, or whatever.)
Maybe another way to think about this how the agent relates to the social reward gradient; if it’s just a fact of the environment, then it makes sense to learn about it the way you would learn about coffee, whereas if it’s another agent influencing you as you influence it, then it makes sense to keep separate books, and only not do so when the expected costs outweigh the expected rewards.
I agree. As far as I can tell, people seem to be predicting their on-policy Q function when considering different choices. See also attainable utility theory and the gears of impact.
[The following is a musing that might or might not be adding anything.]
One thing that comes to mind here is framing myself as a mesa-optimizer in a (social) training process. Insofar as the training process worked, and I was successfully aligned, my values are the values of the social gradient. Or the I might be an unaligned optimizer intending to execute a treacherous turn (though in this context, the “treacherous turn” is not a discreet moment when I change my actions, but rather a continual back-and-forth between serving selfish interests and serving the social morality, depending on the circumstances).
I’m not sure that that is always what it feels like. I can feel pride at my moral execution.