Thanks for writing this series.
I can see how Approval Reward explains norm-following behavior. If people approve of honesty, then being honest will make people approve of me.
But I’m not totally convinced that Approval Reward is enough to explain norm-enforcement behavior on its own?
For some action or norm X, it doesn’t seem obvious to me that “doing X” and “punishing someone who does not-X” are equivalent in terms of human-approving, unless you already knew that humans punish others who do things they don’t like.
If you know that humans punish others who act contrary to norms that the humans value, then you can punish dishonest people to show that you value honesty, and then you’ll get an Approval Reward from other humans who value honesty.
But suppose that nobody already knew the pattern that humans would punish others who act contrary to norms that the humans value. Then when you see someone being dishonest (acting contrary to honesty), then you don’t know that “punishing this person for dishonesty will make others see that I value honesty”, and so you wouldn’t expect to get an Approval Reward. Therefore you wouldn’t be motivated to punish them (if Approval Reward was your only motivation). And if everyone thinks the same way, then nobody will do any punishments for approval’s sake, and so you won’t see any examples from which to learn the pattern.
So it seems to me that although Approval Reward can take norm-enforcement behavior that already exists and “keep it going” for a while, it must have taken some other motivation to “get it started”. In the case of harmful norm violations, the enforcement could have been caused by Sympathy Reward plus means-end reasoning (as you mentioned in another context). But I think humans sometimes punish people for even harmless norm violations (e.g. fashion crimes), so either that was caused by misgeneralization from harmful violations, or there’s some third motivation involved.
I’m not sure about this, though.
Yeah, that model sounds plausible to me (pending elaboration on how the friend-or-enemy parameter is updated). Thanks.