Preschool: learning “if I threaten to hit people, they can refuse to play with me, which sucks, so I guess I won’t do that”. Shamefully, learning this via experience.
Probably early elementary school: learning “if I lie about things, then people won’t believe me, so I guess I won’t do that.” Again via shameful experience. Eventually, I developed this into a practically holy commandment; not sure what the external factors were.
Some kind of scientific ethic? Feynman with the “the easiest person to fool is yourself; to maintain scientific integrity, you have to bend over backwards, naming all the potential reasons you might be wrong” and stuff.
A developing notion that lying was evil, that it could mess things up really badly, that good people who tried lying quickly regretted it (probably mostly fictional examples here), and that the only sensible solution was a complete prohibition.
Middle school: took a game theory class at a summer camp; learned about the Prisoner’s Dilemma and tragedy of the commons; threats and promises; and the hawk-dove game with evolutionarily stable strategies. This profoundly affected me:
The threats-and-promises thing showed that it was sometimes rational to (visibly) put yourself into a state (perhaps with explicit contracts, perhaps with emotions) where you would do something “irrational”, because that could then change someone else’s behavior.
With the one-shot Prisoner’s Dilemma, it seemed clear that, to get the best outcome for everyone, it was necessary for everyone to have an “irrational” module in their brain that led them to cooperate. To a decent extent one can solve real-world situations with external mechanisms that make it no longer a one-shot Prisoner’s Dilemma—reputation, private ownership rights—but it’s not a complete solution.
In the hawk-dove game, two birds meet and there’s a resource and they have the option to fight for it; we figure each bird follows a certain strategy, dictated by their genes, and winning strategies will increase in prevalence.[1] Lessons I took: there are multiple different equilibria, some better than others, some more abusive than others; if the population has a high enough fraction of people who will fight back against abuse beyond the point of “rationality”, this will prevent abusers from dominating, and should be considered a public service.
At some point I encountered the “non-aggression principle”, and decided that it, coupled with common notions of what counts as aggression against person and property (and therefore what counts as property, which is perhaps contentious), was an excellent Schelling point for the fundament of an ethical system. (I will reluctantly admit that “submit to the strongest man” is also a Schelling point people sometimes go for.)
On the subject of the post:
The “positive, expanding” angle on things like “don’t kill”—more generally, “people have rights that shouldn’t be violated”—that comes to mind is: As you learn to do more cool things with your toys, imagine if someone else could come and take your toys away, or injure you. That would be bad, wouldn’t it? And then have a belief about a connection between the rules other people follow and the rules you follow. Some components:
Have a theory of mind.
Having true peers might be important—a baby (human or AI) who interacts only with the adults who control everything and are very different from them, might find it harder to see or believe in universality of rules.
Another angle—similar to the above, but maybe different?—is to think positively about the rights you have to your toys, and think about the things you’re thereby guaranteed to be allowed to do.
I guess this would be relevant after you’d had the experience of trying to play with someone else’s toys and been told a firm “no, that’s not yours and you don’t have permission”. For an AI, training on that seems doable.
If two hawks meet, they fight, one gets injured, and the damage from the injury exceeds the value of the resource, so the expected value is negative to both birds (the basic scenario can be considered a game of chicken); if a hawk meets a dove, the dove runs away and gets zero, and the hawk gets the resource; if two doves meet, they waste some time on symbolic combat, one of them wins, and the expected value is positive. Evolutionarily stable strategy is some fraction of hawks, some fraction of doves.
Then there are other variations. “Bullies” show fight and scare away doves, but will run away from a real fighter (i.e. hawk), and we figure that when two bullies meet, one gets scared and runs away first. The bully strategy dominates the dove strategy; the equilibrium is a hawk-bully composite.
Then we introduce the “retaliator” to defeat bullies. Retaliators act like doves, but if the other bird shows fight, they fight back. Against hawks and bullies they act like hawks; against doves or other retaliators they act like doves. “Pure retaliator”, or “mostly retaliator, with up to some fraction of doves”, is an evolutionarily stable strategy—and so is hawk-bully. Which one you end up with depends on your starting population.
Further variations can be considered and explored. For example, the most “optimal” result would be one in which all the birds understood some system by which each resource belonged to one bird and not the other, so when they met, one would act like a hawk and the other like a dove, resolving the conflict instantly. Different systems are possible: “the biggest bird”, “the bird whose territory it is”, “the bird who got there first”, “the bird who wins on some arbitrary visible characteristic (like tail length) not necessarily related to combat” (apparently this is a thing), and so on. If there are multiple competing systems, then the majority will tend to push out the minority. An evolutionarily stable (or metastable) equilibrium is “followers of one dominant system, plus up to some percentage of bullies”.
Major stages in my own moral development...
Preschool: learning “if I threaten to hit people, they can refuse to play with me, which sucks, so I guess I won’t do that”. Shamefully, learning this via experience.
Probably early elementary school: learning “if I lie about things, then people won’t believe me, so I guess I won’t do that.” Again via shameful experience. Eventually, I developed this into a practically holy commandment; not sure what the external factors were.
Some kind of scientific ethic? Feynman with the “the easiest person to fool is yourself; to maintain scientific integrity, you have to bend over backwards, naming all the potential reasons you might be wrong” and stuff.
A developing notion that lying was evil, that it could mess things up really badly, that good people who tried lying quickly regretted it (probably mostly fictional examples here), and that the only sensible solution was a complete prohibition.
Middle school: took a game theory class at a summer camp; learned about the Prisoner’s Dilemma and tragedy of the commons; threats and promises; and the hawk-dove game with evolutionarily stable strategies. This profoundly affected me:
The threats-and-promises thing showed that it was sometimes rational to (visibly) put yourself into a state (perhaps with explicit contracts, perhaps with emotions) where you would do something “irrational”, because that could then change someone else’s behavior.
With the one-shot Prisoner’s Dilemma, it seemed clear that, to get the best outcome for everyone, it was necessary for everyone to have an “irrational” module in their brain that led them to cooperate. To a decent extent one can solve real-world situations with external mechanisms that make it no longer a one-shot Prisoner’s Dilemma—reputation, private ownership rights—but it’s not a complete solution.
In the hawk-dove game, two birds meet and there’s a resource and they have the option to fight for it; we figure each bird follows a certain strategy, dictated by their genes, and winning strategies will increase in prevalence.[1] Lessons I took: there are multiple different equilibria, some better than others, some more abusive than others; if the population has a high enough fraction of people who will fight back against abuse beyond the point of “rationality”, this will prevent abusers from dominating, and should be considered a public service.
At some point I encountered the “non-aggression principle”, and decided that it, coupled with common notions of what counts as aggression against person and property (and therefore what counts as property, which is perhaps contentious), was an excellent Schelling point for the fundament of an ethical system. (I will reluctantly admit that “submit to the strongest man” is also a Schelling point people sometimes go for.)
On the subject of the post:
The “positive, expanding” angle on things like “don’t kill”—more generally, “people have rights that shouldn’t be violated”—that comes to mind is: As you learn to do more cool things with your toys, imagine if someone else could come and take your toys away, or injure you. That would be bad, wouldn’t it? And then have a belief about a connection between the rules other people follow and the rules you follow. Some components:
Have a theory of mind.
Having true peers might be important—a baby (human or AI) who interacts only with the adults who control everything and are very different from them, might find it harder to see or believe in universality of rules.
Another angle—similar to the above, but maybe different?—is to think positively about the rights you have to your toys, and think about the things you’re thereby guaranteed to be allowed to do.
I guess this would be relevant after you’d had the experience of trying to play with someone else’s toys and been told a firm “no, that’s not yours and you don’t have permission”. For an AI, training on that seems doable.
If two hawks meet, they fight, one gets injured, and the damage from the injury exceeds the value of the resource, so the expected value is negative to both birds (the basic scenario can be considered a game of chicken); if a hawk meets a dove, the dove runs away and gets zero, and the hawk gets the resource; if two doves meet, they waste some time on symbolic combat, one of them wins, and the expected value is positive. Evolutionarily stable strategy is some fraction of hawks, some fraction of doves.
Then there are other variations. “Bullies” show fight and scare away doves, but will run away from a real fighter (i.e. hawk), and we figure that when two bullies meet, one gets scared and runs away first. The bully strategy dominates the dove strategy; the equilibrium is a hawk-bully composite.
Then we introduce the “retaliator” to defeat bullies. Retaliators act like doves, but if the other bird shows fight, they fight back. Against hawks and bullies they act like hawks; against doves or other retaliators they act like doves. “Pure retaliator”, or “mostly retaliator, with up to some fraction of doves”, is an evolutionarily stable strategy—and so is hawk-bully. Which one you end up with depends on your starting population.
Further variations can be considered and explored. For example, the most “optimal” result would be one in which all the birds understood some system by which each resource belonged to one bird and not the other, so when they met, one would act like a hawk and the other like a dove, resolving the conflict instantly. Different systems are possible: “the biggest bird”, “the bird whose territory it is”, “the bird who got there first”, “the bird who wins on some arbitrary visible characteristic (like tail length) not necessarily related to combat” (apparently this is a thing), and so on. If there are multiple competing systems, then the majority will tend to push out the minority. An evolutionarily stable (or metastable) equilibrium is “followers of one dominant system, plus up to some percentage of bullies”.