Human Values ≠ Goodness
There is a temptation to simply define Goodness as Human Values, or vice versa.
Alas, we do not get to choose the definitions of commonly used words; our attempted definitions will simply be wrong. Unless we stick to mathematics, we will end up sneaking in intuitions which do not follow from our so-called definitions, and thereby mislead ourselves. People who claim that they use some standard word or phrase according to their own definition are, in nearly all cases outside of mathematics, wrong about their own usage patterns.[1]
If we want to know what words mean, we need to look at e.g. how they’re used and where the concepts come from and what mental pictures they summon. And when we look at those things for Goodness and Human Values… they don’t match. And I don’t mean that we shouldn’t pursue Human Values; I mean that the stuff people usually refer to as Goodness is a coherent thing which does not match the actual values of actual humans all that well.
The Yumminess You Feel When Imagining Things Measures Your Values
There’s this mental picture where a mind has some sort of goals inside it, stuff it wants, stuff it values, stuff which from-the-inside feels worth doing things for. In old-school AI we’d usually represent that stuff as a utility function, but we wanted some terminology for a more general kind of “values” which doesn’t commit so hard to the mathematical framework (and often-confused conceptual baggage outside the math) of utility functions. The phrase “human values” caught on.
We don’t really know what human values are, or what shape they are, or even whether they’re A Thing at all. We don’t have trivial introspective access to our own values; sometimes we think we value a thing a lot, but realize in hindsight that we value it only a little. But insofar as the mental picture is pointing to a real thing at all, it does tell us how to go look for our values within our own minds.
How do we go look for our own values?
Well, we’re looking for some sort of goals, stuff which our minds want or value, stuff which drives us, etc. What does that feel like from the inside? Think of the stuff that, when you imagine it, feels really yummy. It induces yearning and longing. It feels like you’d be more complete with it. That’s the feeling of stuff that you value a lot. Lesser versions of the same feeling come when imagining things you value less (but still positively).
Personally… I get that feeling of yumminess and yearning when I imagine having a principled mathematical framework for understanding the internal structures of minds, which actually works on e.g. image generators.[2] I also get that feeling of yumminess and yearning when I imagine a really great night of dancing, or particularly great sex, or physically fighting with friends, or my favorite immersive theater shows, or some of my favorite foods at specific restaurants. Sometimes I get a weaker version of the yumminess and yearning feeling when I imagine hanging out around a fire with friends, or just sitting out on my balcony alone at night and watching the city, or dealing with the sort of emergency which is important enough that I drop everything else from my mind and just focus
Those are my values. That’s what human values look like, and how to probe for yours.
“Goodness” Is A Memetic Egregore
I did not first learn about goodness by imagining things and checking how yummy they felt. I first learned about Goodness by my parents and teachers and religious figures and books and movies and so forth telling me that it’s Good to not steal things, Good to do unto others what I’d have them do unto me, Good to follow rules and authority figures, Good to clean up after myself, Good to share things with other kids, Good to not pick my nose, etc, etc.
In other words, I learned about Goodness mostly memetically, absorbing messages from others about what’s Good.
Some of those messages systematically follow from some general principles. Things like “don’t steal” are social rules which help build a high-trust society, making it easier for everyone to get what they want insofar as everyone else follows the rules. We want other people to follow those rules, so we teach other people the rules. Other aspects of Goodness, especially about cleanliness, seem to mostly follow humans’ purity instincts, and are memetically spread mainly by people with relatively-strong purity instincts in an attempt to get people with relatively-weaker purity instincts to be less gross (think nose picking). Still other aspects of Goodness seem rather suspiciously optimized for getting kids to be easier for their parents and teachers to manage—think following rules or respecting one’s elders. Then there are aspects of Goodness which seem to be largely political, driven by the usual political memetic forces.
The main unifying theme here is that Goodness is a memetic egregore; in practice, our shared concept of Goodness is comprised of whatever messages people spread about what other people should value.
… which sure is a different thing from what people do value, when they introspect on what feels yummy.
Aside: Loving Connection
One thing to flag at this point: you know the feeling of deep loving connection, like a parent-child bond or spousal bond or the feeling you get (to some degree) when deeply empathizing with someone or the feeling of loving connection to God or the universe which people sometimes get from religious experiences? I.e. oxytocin?
For many (most?) people, that feeling is a REALLY big chunk of their Values. It is the thing which feels yummiest, often by such a large margin that it overwhelms everything else. If that’s you, then it’s probably worth stopping to notice that there are other things you value. It is quite possible to hyperoptimize for that one particular yumminess, then burn out and later realize that one values other things too—as many a parent learns when the midlife crisis hits.
That feeling of deep loving connection is also a major component of the memetic egregore Goodness, to such an extent that people often say that Goodness just is that kind of love. Think of the songs or hippies or whoever saying that all the world’s problems would be solved if only we had more love. As with values, it is worth stopping to notice that loving connection is not the entirety of Goodness, as the term is typically used. The people saying that Goodness just is loving connection (or something along those lines) are making the same move as someone trying to define a word; in most cases their usage probably doesn’t even match their own definition on closer inspection.
It is true that deep loving connection is both an especially large chunk of Human Values and an especially large chunk of Goodness, and within that overlap Human Values and Goodness do match. But that’s not the entirety of either Human Values or Goodness, and losing track of the rest is a good way to shoot oneself in the foot eventually.
We Don’t Get To Choose Our Own Values (Mostly)
To summarize so far:
Our Values are (roughly) the yumminess or yearning we feel when imagining something.
Goodness is (roughly) whatever stuff the memes say one should value.
Looking at that first one, the second might seem kind of silly. After all, we mostly don’t get to choose what triggers yumminess or yearning. There are some loopholes—e.g. sometimes we can learn to like things, or intentionally build new associations—but mostly the yumminess is not within conscious control. So it’s kind of silly for the memetic egregore to tell us what we should find yummy.
A central example: gay men mostly don’t seem to have much control over their attraction to men; that yumminess is not under their control. In many times and places the memetic egregore Goodness said that men shouldn’t be sexually attracted to men (those darn purity instincts!), which… usually isn’t all that effective at changing the underlying yumminess or yearning.
What does often happen, when the memetic egregore Goodness dictates something in conflict with actual Humans’ actual Values, is that the humans “tie themselves in knots” internally. The gay man’s attraction to men is still there, but maybe that attraction also triggers a feeling of shame or social anxiety or something. Or maybe the guy just hides his feelings, and then feels alone and stressed because he doesn’t feel safe being open with other people.
Sex and especially BDSM is a ripe area for this sort of thing. An awful lot of people, probably a majority of the population, sure do feel deep yearning to either inflict or receive pain, to take total control over another or give total control to another, to take or be taken by force, to abandon propriety and just be a total slut, to give or receive humiliation, etc. And man, the memetic egregore Goodness sure does not generally approve of those things. And then people tie themselves in knots, with the things that turn them on most also triggering anxiety or insecurity.
So What Do?
I’d like to say here “screw memetic egregores, follow the actual values of actual humans”, but then many people will be complete fucking idiots about it. So first let’s go over what not to do.
There’s a certain type of person… let’s call him Albert. Albert realizes that Goodness is a memetic egregore, and that the memetic egregore is not particularly well aligned with Albert’s own values. And so Albert throws out all that Goodness crap, and just queries his own feelings of yumminess in-the-moment when making decisions.
This goes badly in a few different ways
Sometimes Albert has relatively low innate empathy, and throws out all the Goodness stuff about following the rules and spirit of high-trust communities. Albert just generally hits the “defect” button whenever it’s convenient. Then Albert goes all pikachu surprise face when he’s excluded from high trust communities.
Other times Albert is just bad at thinking far into the future, and jumps on whatever feels yummy in-the-moment without really thinking ahead. A few years down the line Albert is broke.
Or maybe Albert rejects memetic Goodness, ignores authority a little too much, and winds up unemployed or in prison. Or ignores purity instincts a little too much and winds up very sick.
Point is: there’s a Chesterton’s fence here. Don’t be an idiot. Goodness is not very well aligned with actual Humans’ actual Values, but it has been memetically selected for a long time and you probably shouldn’t just jettison the whole thing without checking the pieces for usefulness. In particular, a nontrivial chunk of the memetic egregore Goodness needs to be complied with in order to satisfy your actual Values long term (which usually involves other people), even when it conflicts with your Values short term. Think about the consequences, what will actually happen down the line and how well your Values will actually be satisfied long-term, not just about what feels yummy in the moment.
… and then jettison the memetic egregore and pay attention to your and others’ actual Values. Don’t make the opposite mistake of motivatedly looking for clever reasons to not jettison the egregore just because it’s scary.
- ^
You can quick-check this in individual cases by replacing the defined word with some made-up word wherever the person uses it—e.g. replace “Goodness” with “Bixness”.
- ^
… actually when I first try to imagine that I get a mild “ugh” because I’ve tried and failed to make such a thing before. But when I set that aside and actually imagine the end product, then I get the yummy feeling.
- Legitimate Deliberation by (27 Nov 2025 21:51 UTC; 42 points)
- In Defense of Goodness by (19 Nov 2025 23:03 UTC; 33 points)
- The Ambiguity Of “Human Values” Is A Feature, Not A Bug by (16 Nov 2025 3:47 UTC; 25 points)
- 's comment on 7 Vicious Vices of Rationalists by (18 Nov 2025 4:23 UTC; 7 points)
(I say this all the time, but I think that [the thing you call “values”] is a closer match to the everyday usage of the word “desires” than the word “values”.)
I think we should distinguish three things: (A) societal norms that you have internalized, (B) societal norms that you have not internalized, (C) desires that you hold independent of [or even despite] societal norms.
For example:
a 12-year-old girl might feel very strongly that some style of dress is cool, and some other style in cringe. She internalized this from people she thinks of as good and important—older teens, her favorite celebrities, the kids she looks up to, etc. This is (A).
Meanwhile, her lame annoying parents tell her that kindness is a virtue, and she rolls her eyes. This is (B).
She has a certain way that she likes to arrange her pillows in bed at night before falling asleep. Very cozy. She has never told anyone about this, and has no idea how anyone else arranges their pillows. This is (C).
Anyway, the OP says: “our shared concept of Goodness is comprised of whatever messages people spread about what other people should value. … which sure is a different thing from what people do value, when they introspect on what feels yummy.”
I think that’s kinda treating the dichotomy as (B) versus (C), while denying the existence of (A).
If that 12yo girl “introspects on what feels yummy”, her introspection will say “myself wearing a crop-top with giant sweatpants feels yummy”. This obviously has memetic origins but the girl is very deeply enthusiastic about it, and will be insulted if you tell her she only likes that because she’s copying memes.
By the way, this is unrelated to “feeling of deep loving connection”. The 12yo girl does not have a “feeling of deep loving connection” to the tiktok influencers, high schoolers, etc., who have planted the idea in her head that crop-tops and giant sweatpants look super chic and awesome. I think you’re wayyy overstating the importance of “feeling of deep loving connection” for the average person’s “values”, and correspondingly wayyy understating the importance of this kind of norm-following thing. I have a draft post with much more about the norm-following thing, should be out soon :) UPDATE: this post.
Seconded. The word ‘value’ is heavily overloaded, and I think you’re conflating two meanings. ‘What do you value?’ and ‘What are your values?’ are very different questions for that reason. The first means roughly desirability, whereas the second means something like ‘ethical principles’. I read you as pointing mostly to the former, whereas ‘value’ in philosophy nearly always refers to the latter. Trying to redefine ‘value’ locally to have the other meaning seems likely to result in more confusion than clarity.
Concrete example: I hold ‘help sick friends’ as a considered and endorsed value (or subvalue, or instance of a larger value, whatever). But when I think about going grocery shopping for a sick friend and driving over to drop it at their doorstep, there is zero yumminess or learning, it mostly feels annoying. You could argue that that means it isn’t really one of my values, but at that point you’re using ‘value’ in a fairly nonstandard way.
Good points as usual! On a meta note, I thought when writing this “Steve will probably say something like he usually says, and I still haven’t fully incorporated it into my models, hopefully I’ll absorb some more this time”.
Anyway, I don’t think I want to deny the existence of (A). I want to say that “style X is cool” is a true part of the girl’s values insofar as style X summons up yummy/yearning/completeness/etc feelings on its own, and is not a true part of her values insofar as the feelings involved are mostly social anxiety or a yearning to be liked. (The desire to be liked would then be a part of her values, insofar as the prospect of being liked is what actually triggers the yearning.)
I do want to say that stuff is a true part of one’s values once it triggers those feelings, regardless of whether memes were involved in installing the values along the way. I want to distinguish that from the case where people “tie themselves in knots”, trying to act like they value something or telling themselves that they value something when the feelings are not in fact there, because they’ve been told (or logically convinced themselves) they “should” value the thing.
This touches on an interesting issue for looking at human values. It seems we have systems inside of ourselves that are training what we feel as positive affect aka yumminess. It is classical conditioning. Pavlov’s dog probably feels positive affect in response to the bell in expectation of food. ( I think this was confirmed with studies of monkeys with bananas in boxes that beep. ) So then, which do we want to identify as our values? The things that currently feel yummy? Or the things the system within us is trying to train us to obtain?
Do you explore that idea in your linked post?
How does this carry into the future, when we’ll be able to modify our brains/minds?
Are our Values the real-world things that trigger our feelings, or the feelings themselves? (If the latter, we’ll be able to artificially trigger them at negligible cost and with no negative side effects, unlike today.)
“We Don’t Get To Choose Our Own Values” will be false, so that part will be irrelevant. How does this affect your arguments/conclusions?
Even today, Goodness-as-memetic-egregore can (and have) heavily influence our Values, through the kind of mechanism described in Morality is Scary. (Think of the Communists who yearned for communism so much that they were willing to endure extreme hardship and even torture for it.) This seems like a crucial part of the picture that you didn’t mention, and which complicates any effort to draw conclusions from it.
My own perspective is that what you call Human Values and Goodness are both potential sources (along with others) of “My Real Values”, which I’ll only be able to really figure out after doing or learning a lot more philosophy (e.g., to figure out which ones I really want to, or should, keep or discard, or how to answer questions like the above). In the meantime, my main goals are to preserve/optimize my option values and ability to eventually do/learn such philosophy, and don’t do anything that might turn out to be really bad according to “My Real Values” (like deny some strong short-term desire, or commit a potential moral atrocity), using something like Bostrom and Ord’s Moral Parliament model for handling moral uncertainty.
Main answer: this post is aimed at a lower level than you are at, and I intentionally did not unpack some of the more advanced questions, because that would have involved long sections which lower-level readers would find either hard to follow or unmotivated.
That said, the way I’d think about your points is in Values Are Real Like Harry Potter and We Don’t Know Our Own Values.
I’ve now read your linked posts, but can’t derive from them how you would answer my questions. Do you want to take a direct shot at answering them? And also the following question/counter-argument?
Suppose I’m a sadist who derives a lot of pleasure/reward from torturing animals, but also my parents and everyone else in society taught me that torturing animals is wrong. According to your posts, this implies that my Values = “torturing animals has high value”, and Goodness = “don’t torturing animals”, and I shouldn’t follow Goodness unless it actually lets me better satisfy my values better long-term, in other words allows me to torture more animals in the long run. Am I understanding your ideas correctly?
(Edit: It looks like @Johannes C. Mayer made a similar point under one of your previous posts.)
Assuming I am understanding you correctly, this would be a controversial position to say the least, and counter to many people’s intuitions or metaethical beliefs. I think metaethics is a hard problem, and I probably can’t easily convince you that you’re wrong. But maybe I can at least convince you that you shouldn’t be as confident in these ideas as you appear to be, nor present them to “lower-level readers” without indicating how controversial / counterintuitive-to-many the implications of your ideas are.
From the top:
Not quite either of those, but if we’re speaking loosely then the real-world things that trigger our feelings. Definitely not the feelings themselves.
It’s already false today for things like e.g. heroin; drugs already make it possible to overwrite our values if we so choose. I would reason about future opportunities to overwrite our values in much the same way I reason about heroin today (and in much the same way which I think most people reason about heroin today).
Yup, I totally buy that that happens, including in more ordinary day-to-day ways. At the point where a meme has integrated itself into the feeling-triggers directly, I’m willing to say “ok this meme has become a part of this person’s actual values”. As with heroin, this is a thing which one typically wants to avoid under one’s current values, but once it’s happened there’s no particular reason to undo it (at least from the first-person perspective; obviously people try to overwrite others’ values all the time).
At some point, somewhere in this process, one needs to figure out what counts as evidence about value, i.e. what crosses the is-ought gap. And I would be real damn paranoid about giving a memetic egregore de-facto write access to the “ought” side of the is-ought gap.
I’d flag that there’s still instrumental considerations, i.e. other people assign (a lot of) negative value to animals being tortured and I probably want to still be friends with those people so I might want to avoid the torture for practical reasons.
That said, steelmanning: in a world where basically all humans enjoyed torturing animals, yes, those alternate-humans should-according-to-their-own-values torture lots of animals. Obviously that is controversial, but also-obviously it’s one of those things that’s controversial mostly for stupid reasons (i.e. people really want to find some reason why their own values are the One True Universal Good), not for good reasons.
I don’t know if this is johnswentworth’s intended meaning but I read this more as “instructions to be effective”, or “a discussion of how things are” not “approval of hypothetical alternate values”.
It is true that for a person to most effectively seek their own values they need to seek their own values rather than the values suggested by goodness. I don’t think agreeing with or discussing that sentiment should imply an approval of alternate values other people might have.
If someone did value torturing animals, I would want them to seek pleasure in the simulated torture of animals and for them to be prevented from torturing real animals because that is part of my values which are the ones I am trying to seek regardless of the values suggested by goodness or animal torturer’s values.
I think “people having freedom and capability to seek their own values” is also part of my values. It is a part that makes me want people to understand the relationship between their values and the values suggested by goodness and that really does create a contradiction in my values, but I don’t believe discussing the relationship between, or inequality of, peoples values and the values suggested by goodness should imply my values are permissive towards animal torturer’s values.
Still, I think the implication you have pointed out is a good one to clarify. Does my clarification make sense? I prefer it to johnswentworth’s steelmanning in his reply to your comment. Although, I agree with his sentiment that humans should be trying to understand our own values and negotiating and coordinating between people with different values, rather than seeking to find some objectively true values that I don’t believe exist.
I wish there was some kind of disclaimer or hint near the beginning of the text that this is the case, so I would know to read it with this in mind (or skip it altogether as not written for me).
What would you want such a disclaimer or hint to look like?
(I am concerned that if a post says something like “this post is aimed at low-level people who don’t yet have a coherent foundational understanding of goodness and values” then the set of people who actually continue reading will not be very well correlated with the set of people we’d like to have continue reading.)
Maybe something like “This post presents a simplified version of my ideas, intended as an introduction. For more details and advanced considerations, please see such and such posts.”
I think the confusion here is that “Goodness” means different things depending on whether you’re a moral realist or anti-realist.
If you’re a moral realist, Goodness is an objective quality that doesn’t depend on your feelings/mental state. What is Good may or may not overlap with what you like/prefer/find yummy, but it doesn’t have to.
If you’re a moral anti-realist, either:
“Goodness” is meaningless.
“Goodness” is a shorthand for something like:
“My fundamental, least changeable preferences/likes/wants”
“The subset of my preferences/likes/wants that many other people share”
“The subset of my preferences/likes/wants that it’s socially acceptable to talk a lot about/encourage others to adopt”
“The subset of my preferences/likes/wants that I want others to adopt”
I think “Human Values” is a very poor phrase because:
If you’re a moral realist, you can just say “Goodness” instead of “Human Values”.
If you’re a moral anti-realist, you can just talk about your preferences, or a particular subset of your preferences (e.g. any of the options listed above).
Instead, people referring to “Human Values” obscure whether they are moral realists or anti-realists, which causes a lot of confusion when determining the implications and logical consistency of their views.
I notice I am confused. If “Goodness is an objective quality that doesn’t depend on your feelings/mental state”, then why would the things humans actually value necessarily be the same as Goodness?
A common use of “Human Values” is in sentences like “we should align AI with Human Values” or “it would be good to maximize Human Values upon reflection”, i.e. normative claims about how Human Values are good and should be achieved. However, if you’re not a moral realist, there’s no (or very little) reason to believe that humans, even if they reflect for a long time etc., will arrive on the same values. Most of the time if someone says “Human Values” they don’t mean to include the values of Hitler or a serial killer. This makes the term confusing, because it can both be used descriptively and normatively, and the normative use is common enough to make it confusing when used as a purely descriptive term.
I agree that if you’re a moral realist, it’s useful to have a term for “preferences shared amongst most humans” as distinct from Goodness, but Human Values is a bad choice because:
It implies preferences are more consistent amongst humans than they really are
The use of “Human Values” has been too polluted by others using it in a normative sense
I agree that the distinction is important. However, my view is that a lot of what you call “goodness” is part of society’s mechanism to ensure cooperate/cooperate. It helps other people get yummy stuff, not just you.
You can of course free yourself from that mechanism, and explicitly strategize how to get the most “yumminess” for yourself without ending up broke/addicted/imprisoned/etc. If the rest of society still follows “goodness”, that leads to defect/cooperate, and indeed you end up better off. But there’s a flaw in this plan.
Part of the point I intended to convey with the post is that society pushing for cooperate/cooperate is one way that Goodness-claims can go memetic, but there are multiple others ways memeticity can be achieved which are not so well aligned with the Values of Humans (either one’s own values or others’). Thus this part:
The message is definitely not to go hammering the defect button all the time, that’s stupid. Yet somehow every time someone suggests that Goodness is maybe not all it’s cracked up to be, lots of onlookers immediately round this to “you should go around hammering the defect button all the time!” (some with positive affect, some with negative) and man I really wish people could stop rounding that off and absorb the actual point.
Hmm. In all your examples, Albert goes against “goodness” and ends up with less “yumminess” as a result. But my point was about a different kind of situation: some hypothetical Albert goes against “goodness” and actually ends up with more “yumminess”, but someone else ends up with less. What do you think about such situations?
I would ask Albert: do you generally find it yummy when other people get more yumminess? Do you usually feel like shit when you screw over someone else? For most people, the answers to these are “yes”. Most people do not actually like screwing over other people, most of the time (though there are of course exceptions).
Insofar Albert is a sociopath, or is in one of those moods where he really does want to screw over someone else… I would usually say “Look man, I want you to pursue your best life and fulfill your values, so I wish you luck. But also I’m going to try to stop you, because I want the same for other people too, and I want higher-order nice things like high trust communities.”. One does not argue against the utility function, as the saying goes.
I think this is very culturally dependent. For example, wars of conquest were considered glorious in most places and times, and that’s pretty much the ultimate form of screwing over other people. Or for another example, the first orphanages were built by early Christians, before that the orphans were usually disposed of. Or recall how common slavery and serfdom have been throughout history.
Basically my view is that human nature without indoctrination into “goodness” is quite nasty by default. Empathy is indeed a feeling we have, and we can feel it deeply (...sometimes). But we ended up with this feeling mainly due to indoctrination into “goodness” over generations. We wouldn’t have nearly as much empathy if that indoctrination hadn’t happened, and it probably wouldn’t stay long term if that indoctrination went away.
I do want to say that stuff is a true part of one’s values once it triggers the feelings of yumminess/yearning/etc, regardless of whether memes were involved in installing the values along the way. I want to distinguish that from the case where people “tie themselves in knots”, trying to act like they value something or telling themselves that they value something when the feelings are not in fact there, because they’ve been told they “should” value the thing.
So yeah, some of our actual values are installed culturally/memetically, and that doesn’t automatically make them bad or fake values. I’m on board with that, so long as the underlying feelings of yumminess/yearning/etc actually show up.
We can throw out the other junk of memetic egregore Goodness, without abandoning the stuff people actually feel good about.
But why do you think that people’s feelings of “yumminess” track the reality of whether an action is cooperate/cooperate? I’ve explained that it hasn’t been true throughout most of history: people have been able to feel “yummy” about very defecting actions. Maybe today the two coincide unusually well, but then that demands an explanation.
I think it’s just not true. There are too many ways to defect and end up better off, and people are too good at rationalizing why it’s ok for them specifically to take one of those ways. That’s why we need an evolving mechanism of social indoctrination, “goodness”, to make people choose the cooperative action even when it doesn’t feel “yummy” to them in the moment.
I don’t think that’s the right question here?
Let me turn it around: you say “That’s why we need an evolving mechanism of social indoctrination, “goodness”, to make people choose the cooperative action even when it doesn’t feel “yummy” to them in the moment.”. But, like, the memetic egregore “Goodness” clearly does not track that in a robust generalizable way, any more than people’s feelings of yumminess do. The egregore is under lots of different selection pressures besides just “get people to not defect”, and the egregore has indoctrinated people in different things over time. So why are you attached to the whole egregore, rather than wanting to jettison the bulk of the egregore and focus directly on getting people to not defect? Why do you think that the memetic egregore Goodness tracks the reality of whether an action is cooperate/cooperate?
I feel you’re overstating the “any more” part, or at least it doesn’t match my experience. My feelings of “goodness” often track what would be good for other people, while my feelings of “yumminess” mostly track what would be good for me. Though of course there are exceptions to both.
This can be understood two ways. 1) A moral argument: “We shouldn’t have so much extra stuff in the morality we’re blasting in everyone’s ears, it should focus more on the golden rule / unselfishness”. That’s fine, everyone can propose changes to morality, go for it. 2) “Everyone should stop listening to morality radio and follow their feels instead”. Ok, but if nobody listens to the radio, by what mechanism do you get other people to not defect? Plenty of people are happy to defect by feels, I feel I’ve proved that sufficiently. Do you use police? Money? The radio was pretty useful for that actually, so I’m not with you on this.
This seems incoherent to me? I’d like it if all the sociopaths are duped by society into not pursuing their values, that’s great for my values, and because they’re evil I’d rather them not pursue their best life. However I still support distinguishing between goodness and human values for the same general-purpose reasons why often, even if its possible in principle to use some piece of information for evil, its still often better to spread & talk about that information than not.
More generally I think people are too quick to use the phrase “One does not argue against the utility function, as the saying goes.” Yes, you can’t argue against the utility function, but if someone has a bad utility function and is unaware what that utility function is, I’m not going to dissuade them from that (unless I think they’ll be happy to cooperate with me on bettering both our goals if I do, but sociopaths are not known for such behavior). That’s part of stopping them.
I’m quite confident my preferences are coherent here, it’s one of the parts of my values I’m most familiar with.
There’s both an instrumentalish and a terminalish component. The terminalish component is roughly a really strong preference to not try to mislead people about their own values; that in particular is just incredibly deeply wrong for me to do according to my own values. The instrumentalish component is… very similar to the thing where people are like “well we need to be a little hyperbolic or misleading or conceal our true intent in order to spread our political message successfully” and then over and over again that type of reasoning leads people to metaphorically smack themselves in the face, it’s a massive own goal, it just does not work.
Indeed, you could make a very reasonable argument that the entire reason AI might be dangerous is because once it’s able to automate away the entire economy, as an example, defection no longer has any cost and has massive benefits (at least conditional on no alignment in values).
The basic reason why you can’t defect easily and gain massive amounts of utility from social systems is a combo of humans not being able to evade enforcement reliably, due to logistics issues, combined with people being able to reliably detect defection in small groups due to reputation/honor systems, and combined with the fact that humans as individuals are far, far less powerful even selfishly as individuals than as cooperators.
This of course breaks once AGI/ASI is invented, but John Wentworth’s post doesn’t need to apply to post-AGI/ASI worlds.
I think that could probably also use to be a short post with a 5 word title encapsulating it.
Directionally correct advice for confused rationalist, but many of the specific claims are so imprecise or confused as to make many people more confused than enlightened.
Goodness is not an egregore. More sensible pointer would be something like Memetic values. Actually different egregores push for different values, often contradictory.
What happens on a more mechanistic level:
- when memes want people to do stuff, they can do two somewhat different things: 1) try to manipulate some existing part of implicit reward function 2) manipulate the world model
- often the path via 2) is easier; sometimes the hijack/rewrite is so blunt it’s almost funny: for example there is a certain set of memes claiming you will get to mate with large number of virgin females with beautiful eyes if you serve the memeplex (caveat is you get this impressive boost to reproductive fitness only in the afterlife)
-- notice in this case basically no concept of goodness is needed / invoked, the structure rests on innate genetic evolutionary values, and change in world model
- another thing which the memes can try to do is also to replace some S1 model / feeling with a meme-based S2 version, such us the yumminess-predictor box with some explicit verbal model (you like helping people? give to GiveWell recommended charities)
-- this is often something done by rationalists and EAs
—S2 Goodness is part of this, but non-central
Memetic values actually are important part of human values—at least my reflectively endorsed values. Large part of memetic values is human-aligned at the level of groups of humans (ie makes groups of humans function better, cooperate, trust each other, …) or at the level of weird deals across time (ie your example other aspects of Goodness seem rather suspiciously optimized for getting kids to be easier for their parents and teachers to manage—think following rules or respecting one’s elders—could be a bargain: is if the kid is hard and expensive to manage and does not repsect the parent, and all of that would be known to the prospective parent, the parent could also decide to not bring the kid into existence).
Also The Yumminess You Feel is often of cultural evolutionary, ie, influenced by memetics. Humans are basically domesticated by cultural evolution; if you wonder whether selective evolutionary pressure can change someting like values or sense of yumminess, look at dogs. We are more domesticated than dogs. The selection pressures over many generations are different than current culture, but if after reading the text, someone starts listening to their yumminess feel and believes he is now driven by Actual, Non-memetic Human values, they are deeply confused.
I do not think this matches my usage of the words “Human Values” or (especially) “Goodness” (nor of the usage of the rare intelligent people whose ethical judgement I trust). The concept of yumminess/yearning is relevant; the concept of popular assertions of what one oughts to yearn for is relevant. But I object to both of these rough definitions on the grounds that they miss many central aspects.
Concretely: consider a heroin addict, in a memetic environment that strongly disapproves of heroin usage. Because of their addiction, by far the greatest yumminess they feel when imagining things is more heroin (and things which may have brought their past-self feelings of yumminess no longer have that feeling, because it cannot compete). In your framework, getting more heroin is part of their Values, but not part of their culture’s Goodness.
So far so good — but now compare to your example of a gay man in a memetic environment that strongly disapproves of gay romance and sex. As far as I can tell, your analytic framework treats these cases exactly identically: it’s a conflict between Values and Goodness, maybe with the man repeatedly tying himself up in knots to try and fail to crush his Values in the name of Goodness. But I claim this is wrong: an accurate account of Values and Goodness should be able to distinguish these two scenarios. (Lest you think I’m letting my own biases slip in: replace “gay romance and sex” with one of the sexual fetishes I personally disapprove of and think should be socially stigmatized. The distinction I’m getting at here is different.)
I challenge you to articulate the relevant difference between those two scenarios in your analytical framework. I claim any framework which can’t is flinching away from a hard part of describing the type signatures and natures of Values and Goodness. This is the sense in which I meant that your rough definitions miss central concepts.
(Unless you assert that the two cases aren’t different, in which case we might just have a more object-level disagreement, as opposed to you being wrong about your word usage.)
As for what central concepts your framework is missing — this deserves a longer response, but in lieu of that I will briefly gesture at one concept. There is the curious but well-known phenomenon whereby there is a difference between what a human wants (in the sense of revealed preference) and what he or she wants to want (in a particular complicated sense I’m only gesturing at). As you understand well, a man can have false beliefs about what he wants. For the same reason, he can have false beliefs about what he wants-to-want. (In particular, verbal description of what one wants-to-want are not identical to what one actually wants-to-want.)
I claim the self-hating socially-stigmatized heroin-addict has correct beliefs about what he wants-to-want, whereas the self-hating socially-stigmatized sexual-deviant has false beliefs thereof. This distinction is not one of yumminess-upon-imagining (each feels yummy upon imagining using heroin and having deviant sex), and it is not one of memetic pressure (each’s behavior is disapproved of by society, and by me personally). But the distinction is central to understanding Human Values and Goodness.
I would strongly agree with this critique, the characterization of goodness as memetics is severely under theorized, memeticity is a superficial aspect and there is a deeper structure worth considering.
I mostly agree with this, the part which feels off is
Humans already follow their actual Values[1], and will always do because their Values are the reason they do anything at all. They also construct narratives about themselves that involve Goodness, and sometimes deny the distinction between Goodness and Values altogether. This act of (self-)deception is in itself motivated by the Values, at least instrumentally.
I do have a version of the “screw memetic egregores” attitude, which is, stop self-deceiving. Because, deception distorts epistemics, and we cannot afford distorted epistemics right now. It’s not necessarily correct advice for everyone, but I believe it’s correct advice for everyone who is seriously trying to save the world, at least.
Another nuance is that, in addition to empathy and naive tit-for-tat, there is also acausal tit-for-tat. This further pushes the Value-recommended strategy in the direction of something Goodness-like (in certain respects), even though ofc it doesn’t coincide with the Goodness of any particular culture in any particular historical period.
As Steven Byrnes wrote, “values” might be not the best term, but I will keep it here.
What people do is a compromise between what society wants them to do, and what they would otherwise do. Theres a sense in which that’s doing what they value,since they disvalue societal punishments. So there is a sense in which they are always doing what they already want, but it misses an important point .. it’s a misleading bit of cleverness.
Without getting into moral realism, following all your own values is likely to get you into trouble with society.
I agree, except that I don’t think it’s especially misleading. If I live on the 10th floor and someone is dangling a tasty cake two meters outside of my window (and suppose for the sake of the argument that it’s offered free of charge), I won’t just walk out of the window and fall to my death. This doesn’t mean I’m not following my values, it just means I’m actually thinking through the consequences rather than reacting impulsively to every value-laden thing.
This post was one of several examples of “rolling your own metaethics” that I had in mind when writing Please, Don’t Roll Your Own Metaethics, because it’s not just proposing or researching a new metaethical idea, but deploying it, in the sense of trying to spread it among people who the author does not expect to reflect carefully about the idea.
I don’t get why this was curated, am I missing something? The piece basically says that what you want to do & what society expects you to do are 2 separate things (a topic which has been explored since time immemorial). Then it says that you should evaluate what you really want to do based on rational thinking & long-term planning (also something incredibly obvious). Is there anything more to it?
I thought the only novel bit was the passage about oxytocin, which is barely 10% of the article.
I think this is a reasonable question. (1) it prompted an interesting thought for me in terms of “people often feel the need to be Good, which is often or usually a social drive more than a moral one”, (2) sometimes I like a new clear explainer on old topics.
This is rather tangential to the main thrust of the post, but a couple of people used a react to request a citation for this claim.
One noteworthy source is Aella’s surveys on fetish popularity and tabooness. Here is an older one that gives the % of people reporting interest, and here is a newer one showing the average amount of reported interest on a scale from 0 (none) to 5 (extreme), both with tens of thousands of respondents.
Very approximate numbers that I’m informally reading off the graphs:
Giving pain: 30% of people interested (first graph), 2⁄5 average interest (second graph)
Receiving pain: 35% and 2⁄5
Being dominant: 30% and 3⁄5
Being submissive: 40% and 3⁄5
Rapeplay: >10% giving, 20% receiving, the second graph combines these at 2⁄5
Slut Humiliation (first graph): 25%
Humiliation (second graph): 2⁄5
Note that a 3⁄5 average interest could mean either that 60% of people are extremely into it or that nearly everyone is moderately into it (or anything in between). Which seems to imply the survey used in the more recent graph has significantly kinkier answers overall, unless I’m misunderstanding something. (I’m fairly certain that people with zero interest ARE being included in the average, because several other fetishes have average interest below 1, which should be impossible if not.)
If we believe this data, it seems pretty safe to guess that a majority of people are into at least one of these things (unless there is near-total overlap between them). The claim that a majority “feel a deep yearning” is not strongly supported but seems plausible.
(I was previously aware that BDSM interest was pretty common for an extremely silly reason: I saw some people arguing about whether or not Eliezer Yudkowsky was secretly the author of The Erogamer, one of them cited the presence of BDSM in the story as evidence in favor, and I wanted to know the base rate to determine how to weigh that evidence.
I made an off-the-cuff guess of “between 1% and 10%” and then did a Google search with only mild hope that this statistic would be available. I wasn’t able today to re-find the pages I found then, but according to my recollection, my first search result was a page describing a survey of ~1k people claiming a ~75% rate of interest in BDSM, and my second search result was a page describing a survey of ~10k people claiming ~40% had participated in some form of BDSM and an additional ~40% were interested in trying it. I was also surprised to read (on the second page) that submission was more popular than dominance, masochism was more popular than sadism, and masochism remained more popular than sadism even if you only looked at males. Also, bisexuality was reportedly something like 5x higher within the BDSM-interested group than outside of it.)
Curated. While in my personal language, I would have treated Goodness as a synonym for Human Values[1], the distinction John is making here is correct, plus his advice on how to approach it. A very important point I have noticed is that when people ask (or anguish), “am I a good person?” this is asking according to the social egregore sense of good – am I good in the way that will be approved by others? Social, despite seeming like a morality thing. By extension, I wonder how much scrupolisity, as an anxiety disorder, is a social anxiety disorder.
I’d guess that the social egregore of Goodness also gets muddled in how it mixes “here are things you do to be a good member of society” and “here are things that are good because they’re personally prudent and or make you attractive to affiliate with for others”, e.g. it’s good to exercise and save money.
And specifically my values because it’s an open question to me how broad my values are shared, cf. The Psychological Unity of Humankind
Object level disagreement with the post explained here.
In my view this is the worst currated post decision in 2025 so far:
- Please, Don’t Roll Your Own Metaethics
- Not sure if you noticed, but the ethical stance suggested in the post is approximately the same as what many newage gurus will tell you “Stop being in your head! Listen to your heart! Follow the sense of yumminess! Free yourself from the expectations of your parents, school, friends and society!”. Tbh this is actually directionally sensible for some types of confused rationalists or people with exteme ammounts of scrupulosity, but is not generally good advice.
- The only reason why this somewhat viable in some contexts is because every adult around internalized a lot of “Memetic values” (by which point John suggests to follow them). It’s a bit like commune of people living ‘free from modern civilization’ which means living 20m from nearby town and growing their own vegetables, relying on the modern civilization only in security, healthacre, tools, industrial production, education, culture, etc etc
- If you read the comments it seems also John agrees that the target audience is specific (in my read people who never thought about human values much, and err on the side of following S2 culturally spread values)
You write:
The post writes:
Yes, I can see a crude resemblance to that kind of advice but there’s a whole big section about not interpreting it in a dumb way. I’m also confused what the complaint is...there could be a hypothetical audience, different from the actual audience, who would take this the wrong way and do dumb things and therefore it’s a bad post even if it makes a correct point?. Granted, seems you think the point is correct.
I am more interested in the question of whether the post’s model is correct, seems like we maybe disagree there based on your comment. I’m not convinced. (Among other things I might say that egregores can be composed of sub-egregores and that’s fine, doesn’t mean there isn’t one here). A bit it feels like details, and the core point of something like your actual values (that are quite hard to determine!) are not the say thing as societal sense of “Good”. This doesn’t preclude interaction between the two and them shaping each other, that feels like it undermines the picture here.
No, the model in the post is mostly not correct. I’m discussing object level disagreements with the post elsewhere, but the ontology of the model is bad, and recommendations are also problematic.
Less wrong model in less confused terminology:
top-level category is human values; these can have many different origins, including body regulatory systems, memes, philosophical reflection, incentives,… ; these can be represented in different ways, including bodily sensations, not really legible S1 boxes producing ‘feelings’, S2 verbal beleifs. There is some correlation between the type of representation and orgin of the value, but its not too strong. Many values of memetic origin are internalized and manifest as S1 feelings, yumminess, etc.
Main thing the post is doing is posting a dichotomy between “not really legible S1 boxes representing values” and “memetic values”.
- This is not a natural way how to carve up the space, because one category is based on type of representation, and other on origin. (It’s a bit like if you divided computers into “Unix-based” and “laptops”).
- Second weird move is to claim that the natural name for the top level categoriy should apply just to the “not really legible S1 boxes representing values”
The “memetic values” box is treated quite weirdly. It is identified with just one example of value—“Goodness”, at is claimed that this value is an egregore. Egregore is the phenotype of a memeplex—the relation to memeplex is similar to the relation of the animal to its genome. Not all memeplexes build egregores, but some develop sufficient coordination technology that it becomes useful to model them through the intentional stance—as having goals, beliefs, and some form of agency. An egregore is usually a distributed agent running across multiple minds. Think of how an ideology can seem to “want” things and “act” through its adherents. In my view goodness is mostly verbal handle people use to point to values. It can point to almost any kind of value, including the S1 values. What egregores often try to do is to hijack the pointer and make it point to some verbal model spread by the memeples. For example: Social Justice is an egregore (while justice is not). What SJ egregore often does is rewrite the content of concepts like justice and fairness and point them to some specific verbal models, often in tension with S1 boxes, often serving the egregore. More useful model of goodness is it as particularly valuable pointer, due to extreme generality. As a result many egregores fight over what should it point to—eg rationalism would want ‘updating on evidence’ to be/feel good, and ‘making up fake evidence to win a debate’ to be bad. But it is a small minority of pathways by which cultural evolution changes your values.
One true claim about memetic values is they are subject to complex selection pressures, sometimes serve egregores, sometimes the collective,… If you meet claims like “the best thing you can do is sacrifice your life to spread this idea” its clearly suspicious.
Overall, the not-carving-reality-at-its-joints means the model in the post is not straightforwardly applicable. The first order read “kick out memetic values, S1 boxes good” is clearly bad advice (and also large part of your S1 boxes is memetic values). Hence a whole section on “don’t actually try to follow this and instead … reflect”. My impression is there is some unacknowledged other type of values guiding the reflection in the direction of “don’t be an asshole”.
--
No, I don’t mean hypotethical audience. I mean, for example, you. If—after reading the post—you believe there is this basic dichotomy between Human Values, and Goodness. Goodness is a memetic egregore, while Human Values are authentically yours and you should follow them, but in non-dumb ways… My claim is this is not carving reality at its joints and if you believe this you are confused. Probably confused in a different way than before (“Goodness as a synonym for Human Values”)
Stepping back for a moment, just want to clarify goal of this comment exchange. In drafting a reply, I realize I was mixing between:
1) determining whether the decision to curate was good or not
2) determining what is true (according to my own beliefs)
3) determining whether the post is “good” or not.
Of course 1) impacts 2) impacts 3).
I think I came in with LessWrong model you describe and the piece didn’t update me so much as seemed like a straightforward explainer of a simple point (“what people say is Good isn’t the same as your Values). I think you have a point that the post does something like set up one side of the dichotomy as S1 boxes, though it’s salient to me that it also has:
That feels appropriately non-committal.
I agree there’s complexity around egregores/memeplexes and how it gets carved up.
It’s definitely not the bar for curation that everything in the post seems correct to the curator. I do think it should leave people better off than if they’d not read it. After this discussion, I’m less sure about this post. “Values are just the S1 boxes” seems so ridiculous to me that I wouldn’t expect anyone to think it, I don’t know. The egregore stuff feels much higher resolution than what this post is going for, though I think there’s interesting stuff to figure out there. I kind of like this post for having sparked that conversation, though perhaps it is a rehash that is tiresome to others.
Really? Did you see this comment of mine? Do you endorse John’s reply to it (specifically the part about the sadist)?
I didn’t see your comment and the thread there, but yes. There is refinement and precision that could be added, whether the feelings vs the generator, etc, etc., but still that there’s something more inherent to you vs something lives outside of you and is more social, that point is correct.
Regarding the sadists, yes, I think the values of the sadist might well be torture and from their perspective, they should be optimizing for that. If my values are anti-sadism (and I think they are), then we are at odds and maybe we fight. I don’t think the structure of values prohibits people from having values different from my own. Strongly feel John’s “people object to this for dumb reasons” stance.
Have you also seen https://www.lesswrong.com/posts/KCSmZsQzwvBxYNNaT/please-don-t-roll-your-own-metaethics which was also partly in response to that thread? BTW why is my post still in “personal blog”?
Yes, though I am unsure how to apply it. Your thread with Raemom was a little helpful.
Posts are manually frontpaged and are typically done as a batch once a day. When I’m assigned, I typically process them around 10-11 PT.
One way you could apply it is by not endorsing so completely/confidently the kind of “rolling your own metaethics” that I argued against (that I see John as doing here), i.e., by saying “the distinction John is making here is correct, plus his advice on how to approach it.” (Of course you wrote that before I posted, but I’m hoping this is one of the takeaways people get from my post.)
Ok, there’s argument I can see of “unlike other domains, ethics/meta-ethics lacks any empirical feedback loop on beliefs [at least that we’ve found] and this means all such claims should be made more lightly than anything more empirical/factual”. Given that, perhaps more hedging is warranted than “is correct”.
Now even before any of this discussion, I’d have been extremely hesitant to lock in my meta-ethical views to ASI, but day to day though, I feel like I need some kind of ethical framework to operate on. That’s where I’m not sure about what to do other than figure out what makes sense to me, in the same way I do for other things.
I’d need to think longer/be convinced to switch to a more modest epistemology specifically for this domain, if that’s kind of the suggestion of “not rolling your own”. That feels like a big topic though.
But yeah, I can take away “be less confident” here.
To some extent “goodness” is some ever moving negotiated set of norms of how one should behave.
I notice that when I use the word “good” (or envoke this consept using other words such as “should”), I don’t use it to point to the existing norms, but as a bid for what I think these norms should be. This sometimes overlap with the existing norms and sometimes not.
E.g. I might say that it’s good to allow lots of diffrent subcultures to co-exist. This is a vote for a norm where peopel who don’t my subculture leave me and my firends alone, in exchange for us leaving them alone. This is not unrelated to me getting what is jummy to me, but it at least one step removed.
“Good” is the set of norms we use to coordinate cooperation. If most people don’t like when you pick your nose in public, then it’s good to make an effort not to do so, and similar for a lot of other values. Even if you don’t care about the nose picking, you probably care about some other of the things “good” coordinates around. For most people it’s probably worth supporing the package deal. But I also think you “should” use your voice to help imrove the notion of what is “good”.
I think the “your values” -framing itself already sneaks in assumptions which are false for a lot of minds/brains. Notably: most minds are not perfectly monolithic/unified things well-modeled as a coherent “you/I/me”. And other minds are quite unified/coherent, but are in the unfortunate situation of running on a brain that also contains other (more or less adversarial) mind-like programs/wetware.
Example:
It is entirely possible to have strongly-held values such as “I reject so-and-so arbitrary/disgusting parts of the reward circuitry Evolution designed into my brain; I will not become a slave to the Blind Idiot God’s whims and attempts to control me”. In that case, the “I” that holds those values clearly excludes at least some parts of its host brain’s yumminess-circuitry.[1] (I.e., feelings of yumminess forced upon the mind are not signals about that mind’s values, but rather more like attempts by a semi-adversarial brain to hack that mind.)
Another example:
Alex has some shitty experiences in childhood, and strongly internalizes a schema S like “if I do X, I will be safe”, and thereafter has strong yumminess feelings about doing X. But later upon reflection, Alex realizes that yumminess feelings are coming from S, and that S’s implicit models of reality aren’t even remotely accurate now in adulthood. Alex would like to delete S from their brain, but can’t. So the strong yumminess-around-X persists. Is X one of Alex’s values?
So, I object to what I perceive to be an attempt to promote a narrative/frame about what constitutes “you/I/me” or “your values” for people in general. (Albeit that I’m guessing that there was no malice involved in that promoting.) Especially when it is a frame that seems to imply that many people (as they conceive of themselves) are not really/fully persons, and/or that they should let arbitrary brain-circuits corrupt their souls (if those brain-circuits happen to have the ability to produce feelings of yumminess).
Please be more careful about deploying/rolling your own metaethics.
Maybe that “I” could be described as a learned mesaoptimizer, something that arose “unintentionally” from perspective of some imaginable/nonexistent Evolution-aligned mind-designer. But so what? Why privilege some imaginary Evolution fairy over an actually existing person/mind?
Okay, but yumminess is not values. If we pick ML analogy, yumminess is reward signal or some other training hyperparameter.
My personal operationalization of values is “the thing that helps you to navigate trade-offs”. You can have yummi feelings about saving life of your son or about saving life of ten strangers, but we can’t say what you value until you consider situation where you need to choose between two. And, conversely, if you have good feelings about parties and reading books, your values direct what you choose.
Choice in case of real, value-laden trade-offs is usually defined by significant amount of reflection about values and memetic ambience supplies known summaries of such reflection in the past.
I mostly don’t seem to have anything new to say in response to this at the moment, but I figured mentioning my comment from a few weeks ago on hunches about origins of caring-for-others was in order, so there it is.
This post doesn’t seem to provide reasons to have one’s actions be determined by one’s feelings of yumminess/yearning, or reasons to think that what one should do is in some sense ultimately specified/defined by one’s feelings of yumminess/yearning, over e.g. what you call “Goodness”? I want to state an opposing position, admittedly also basically without argument: that it is right to have one’s actions be determined by a whole mess of things together importantly including e.g. linguistic goodness-reasoning, object-level ethical principles stated in language or not really stated in language, meta-principles stated in language or not really stated in language, various feelings, laws, commitments to various (grand and small, shared and individual) projects, assigned duties, debate, democracy, moral advice, various other processes involving (and in particular “running on”) other people, etc.. These things in their present state are of course quite poor determiners of action compared to what is possible, and they will need to be critiqued and improved — but I think it is right to improve them from basically “the standpoint they themselves create”.[1]
The distinction you’re trying to make also strikes me as bizarre given that in almost all people, feelings of yumminess/yearning are determined largely by all these other (at least naively, but imo genuinely and duly) value-carrying things anyway. Are you advocating for a return to following some more primitively determined yumminess/yearning? (If I imagine doing this myself, I imagine ending up with some completely primitively retarded thing as “My Values”, and then I feel like saying “no I’m not going to be guided by this lmao — fuck these “My Values”″.) Or maybe you aren’t saying one should undo the yumminess/yearning-shaping done by all this other stuff in the past, but are still advising one to avoid any further shaping in the future? It’d surprise me if ≈any philosophically serious person would really agree to abstain from e.g. using goodness-talk in this role going forward.
The distinction also strikes me as bizarre given that in ordinary action-determination, feelings of yumminess/yearning are often not directly applied to some low-level givens, but e.g. to principles stated in language, and so only becoming fully operational in conjunction with eg minimally something like internal partly-linguistic debate. So if one were to get rid of the role of goodness-talk in one’s action-determination, even one’s existing feelings of yumminess/yearning could no longer remotely be “fully themselves”.
If you ask me “but how does the meaning of “I should X” ultimately get specified/defined”, then: I don’t particularly feel a need to ultimately reduce shoulds to some other thing at all, kinda along the lines of https://en.wikipedia.org/wiki/Tarski’s_undefinability_theorem and https://en.wikipedia.org/wiki/G._E._Moore#Open-question_argument .
You mention how yumminess positively biases for novel things. I think it also negatively biases for habitual things in ways that make not being an idiot harder.
IE, a new relationship feels a lot yummier than the same relationship with the same person 10 years later—even though that relationship is much more valuable to me personally after those 10 years than at the beginning.
There’s a dynamic where we don’t feel yumminess for things we have and are confident that we will continue having, even when those things are very valuable to us.
Two thoughts.
This is a kind of strange gap in human cognition that seems prone to a lot of common and obvious failure modes.
I wonder if the societal concept of Goodness is an attempted patch, which would imply that rejecting the memetic egregore successfully is harder than just not being an idiot. (And often involves subcultures building their own memetic egregore to serve a similar purpose).
IE, feeling right is really yummy … and it was ~ the memetic egregore of Rationality that got me to a point where being right is yummier than seeming / feeling right.
This feels related to generalized-hangriness—our values point in useful directions while on the object level being wrong.
IE, feeling right points to a desire to be respected and to understand the world, but is itself a bad object level way to achieve those goals.
So I’m not sure that “jettison the memetic egregore and pay attention to your and others’ actual Values” is good advice, and I worry that actual is doing a lot of non obvious work that is kind of contradictory with defining Values as things that feel yummy.
I’d highlight there the distinction between “terminal-ish” values vs “instrumental-ish” values. Part of “don’t be an idiot about it” is to not just myopically chase the terminal-ish yumminess feelings; rather, plan ahead to embrace more yumminess feelings long term by working on instrumental-ish value (which might not provide yummy feelings in its own right) shorter term.
I like the sharp distinction you draw between
and
but the post treats these as more separable than they actually are from the standpoint of how the brain acquires preferences.
You emphasize that
and that Goodness trying to overwrite that is “silly.” Yet a few paragraphs later you note that
before recommending to “jettison the memetic egregore” once the safety-function parts are removed.
But the brain’s value-learning machinery doesn’t respect this separation. “Yumminess/yearning” is not fixed hardware; it’s a constantly updated reward model trained by social feedback, imitation, and narrative framing. The very things you group under “Goodness” supply the majority of training data for what later becomes “actual Values.” The egregore is not only a coordination layer or a memetically selected structure on top, it is also the training signal.
Your own example shows this coupling. You say that
while also being a core part of Goodness. This dual function of a learned reward target and the memetic structure that teaches people to want it, is typical rather than exceptional.
So the key point isn’t “should you follow Goodness or your Values?” but “which training signals should you expose your value-learning architecture to?” Then the Albert failure mode looks less like “he ignored Goodness” and more like “he removed a large portion of what shapes his future reward landscape.”
And for societies, given that values are learned, the question becomes which parts of Goodness should we deliberately keep because they stabilize or improve the learning process, not merely because they protect cooperation equilibria?
I think there may be a fairly critical confusion here, but perhaps have missed the key bit (or perhaps by seeing this particular tree have missed the forest the post is aiming at) that would address that. It seems that in “human values” here are defined very much in terms of a specific human. However, “goodness” seems to be more about something larger—society, the culture, humanity as a class or even living things.
I suspect a lot of the potential error in treating the terms as near to one another disappears if you think of goodness for a specific person or thinking of human values in terms of human as a group that holds common values. (Granted, in this latter case get to specific values will be problematic but in terms of pure logic or abstract reasoning I don’t think the issues are nearly as bad as implied in the OP.)
A lot of goodness is about what you should do rather than what you should feel yearning for. There’s less conflict there. Even if you can’t change what you feel yearning for, you can change what you do.
Thank you for this article, I find the subject interesting.
In this article, I am rather surprised by the use of the word ‘value’, also in the comments, so I wondered if it was a language issue on my part.
However, the fact that the author wonders whether human values are good is something that fits in with my initial interpretation of the word value, which is as follows: value in the deepest sense, what is most important in life.
And my initial interpretation seems to be in line with that of the Stanford Encyclopedia of Philosophy, for example: https://plato.stanford.edu/entries/value-theory/
So I find it difficult to understand why “value” then takes on the meaning of “what we like,” which seems to me to have nothing (or very little) to do with it.
Nevertheless, despite this potential difference in concept, I find that certain reflections remain valid even when taken in a philosophical sense.
For example, this, with which I agree, even when taking the word in the philosophical sense of “value.”
I find this difficulty fascinating and believe it necessitates precise thought experiments on this subject in order to realize how poorly we model our own values (in the deepest sense, once again).
There is also the question of how to aggregate individuals’ (moral/deep) values into human (moral/deep) values, which does not seem at all obvious to me (neither the average, nor the sum, nor any other aggregation function seems to behave well a priori?).
One idea I am currently imagining is more like creating a new global model from a collection of thought experiments (and concrete decisions to be made, to avoid problems of abstraction) that is very refined in order to distinguish subtleties, and which would be iteratively refined by proposing more and more “twisted” cases to question the foundations, on which a large number of people would express their opinions after a certain (significant) period of internal deliberation.
I believe this is also referred to as “positive affect”. I really like and use the term exactly because, as you mention later, many people like to fantasize about and explore things that are normally associated with negative affect, so you can’t point to any specific source of positive affect to refer to positive affect.
Pragmatically, I think people will know what you mean more often with “yumminess” than “positive affect”, but I think “positive affect” might be the technically correct term.
I basically agree with the thrust of this post, namely that we need a distinction between our values and goodness. Otherwise, we would not be able to ask the question whether we want what is good, for example. Or to put it differently, there is a conceptual distinction between what is desired and what is desirable, whatever determines the latter.
Furthermore, I agree that it is rather common to see what is desirable as some kind of function of what we in fact value. For example, in economics it is rather common to identify welfare with preference-satisfaction. However, even those who see such close relations between what we want and what would be good to want tend not to identify them, typically arguing that the latter is given by some kind of coherent extrapolated volition (e.g. philosophers like Bernard Williams or Richard Brandt).
With that said, I also think that the OP bakes in a few too many commitments about what it means for something to be valued or to be good, for that matter. On the value side, I agree with Steven Byrnes that this is best identified with something like a desire. However, it is worth noting that desires understood as such motivational pulls don’t necessarily come with a phenomenology of yumminess. For example, some of the things that I care about the most, I feel the least when thinking about, such as having a room to sleep. And other things I might feel positively bad when thinking about, such as the realization that I have to give up something that I was really looking forward to to meet a commitment that I really value. (There are variations of these cases that might make them less effective, e.g. if I faced the prospect of losing my room, but I think the point that the relation between experienced yuminess and desire is a contingent empirical claim stands).
On the goodness side, I am a bit worried that the OP conflates a few different things. To start, it seems that we want to distinguish between our representations of things as good (e.g. norms) and the good stuff itself. For example, I don’t think that we want to identify goodness with the norms that we have around what art is good as opposed to the art itself. Furthermore, it seems like the OP identifies goodness with something like moral goodness specifically. We probably want to separate that, and make it a subset of things that can be good. For example, we might think that things like healthy food and good conversation are good things that we should desire, but they are not obviously morally good.
Notably, I’m taking no stance on the question of what makes something good or bad here. It might be that things are good and bad because our norms say so, but that’s a stronger commitment than merely saying that goodness is separate from what we want, and one I don’t think we want to bake into the distinction itself. (So contrary to Nina Panickssery, I don’t think that the realist/antirealist distinction is as central here).
Finally, I think that this is probably a discussion where the conversation would benefit from some context in analytic philosophy, where many people have discussed this question at length, I believe quite fruitfully. Some classic papers I like on this include Railton (1986), ‘Moral Realism’ (on a naturalistic account of goodness and its relation to desire), and Quinn (1994), ‘Putting Rationality in Its Place’ (an argument against identifying goodness with what we want, or rational behavior with the mere satisfaction of desires for that matter).
Also worth taking into consideration: things that feel anti-yummy. Fear/disgust/hate/etc are also signals about your values.
So true, this reminds me of Jung’s emphasis on “the shadow“—it’s important to acknowledge (and not discount) “values” you hold that are selfish or otherwise not ostensibly pro-social.
This is also important to note. We are often torn between selfish wants and the wants and needs of others. This can be framed as selfishness = bad, concern for others = good. But I think it’s better interpreted as you say, that “goodness” is usually aligned with our own long-term interests which are often also aligned with the interests of others. So your values need not be a zero-sum contest between your interests and the interests of others.
A thought I get is that
“Goodness defines the boundaries within which we can optimise for our values/desires (and suitably adjust for memetics (see above discussion)”, and that “Goodness should evolve as to allow for the optimisation to occur as well as possible for as many as possible”?
“We don’t really know what human values are”
But we might, or might begin to: I put the effor tin over here :: Alignment ⑥ Values are an effort not a coin https://whyweshould.substack.com/p/alignment-values-are-an-effort-not
or in derived format: If all values are an effort, prices are a meeting of efforts https://whyweshould.substack.com/p/if-all-values-are-an-effort-prices
even deontological positions are an effort, evolution cares about the effort, not the ideal forms
One (over)optimistic hope I have is that something like a really good scale-free theory of intelligent agency would define a way to construct a notion of goodness that was actually aligned with the values of the members of a society to the best extent possible.
Is there a distinction to be made between different kinds of social imperatives?
e.g. I think a lot of people might feel the mimetic egregore tells them they should try to look good more than it tells them to be humble, but they might still associate the latter with ‘goodness’ more because when they are told to do it it is in the context of morality or virtue.
I agree there is an important distinction, but I think the social memetic aspect of “Goodness” is not central. The central distinction is that we have access to yumminess directly, it is the only thing we “truly care about” in some sense, but as bounded and not even perfectly coherent agents, we’re unable to roll our predictions forward over all possible action paths and maximize yumminess.
Instead we need to form a compact /abstracted representation of our values/yuminess to 1) make them legible to ourselves and 2) make plans to attain them 3) communicate them 4) make them more coherent
I update my moral values based on my ontology. I try to factor in epistmic uncertainty. I do not attribute goodness to human values, because I do not center my world view around humans only. What an odd thing to do.
Ethics to me is an epistemic project. I read literature, poetry, the Upanishads, the Gita, the Gospels, Meditations, the sequences… More obscure things. I think and I update.
To me, the basic level of “goodness” is roughly “do no harm on offense”, or “do not go against others will on offense” to others. (This level of goodness is actually much needed for humans to survive as well.)
I think some of the central models/advice in this post [1] are in an uncanny valley of being substantially correct but also deficient, in ways that are liable to lead some users of the models/advice to harm themselves. (In ways distinct from the ones addressed in the post under admonishments to “not be an idiot”.)
In particular, I’m referring to the notion that
I agree that “yumminess” is an important signal about one’s values. And something like yumminess or built-in reward signals are what shape one’s values to begin with. But there are a some further important points to consider. Notably: Some values are more abstract than others[2]; values differ a lot in terms of
How much abstract/S2 reasoning any visceral reward has to route through in order to reinforce that value.
How much abstract/S2 reasoning is required to determine how to satisfy that value, or to determine whether an imagined state-of-affairs satisfies (or violates) that value.
(Or, conversely:) How readily S1 detects the presence (or lack/violation) of that value in any given imagined state-of-affairs, for various ways of imagining that state-of-affairs.
Also, we are computationally limited meat-bags, sorely lacking in the logical omniscience department.
This has some consequences:
It is possible to imagine or even pursue goals that feel yummy but which in fact violate some less-obvious-to-S1 values, without ever realizing that any violation is happening.[3]
Pursuing more abstract values is likely to require more willpower, or even incur undue negative reinforcement, and end up getting done less.[4][5]
More abstract values V are liable to get less strongly reinforced by the brain’s RL than more obviously-to-S1-yummy values W, even if V in fact contributed more to receiving base/visceral reward signals.
Which in turn raises questions like
Should we be very careful about how we imagine possible goals to pursue? How do we ensure that we’re not failing to consider the implications of some abstract values, which, if considered, would imply that the imagined goal is in fact low-or-negative value?
Should we correct for our brains’ stupidity by intentionally seeking more reinforcement for more abstract values, or by avoiding reinforcing viscerally-yummy values too much?
Should we correct for our brain’s past stupidity (failures to appropriately reinforce more abstract values) by assigning higher priority to more abstract values despite their lower yumminess?[6]
Or does “might make right”? Should we just let whatever values/brain-circuits have the biggest yumminess-guns determine what we pursue and how our minds get reinforced/modified over time? (Degenerate into wireheaders in the limit?)
The endeavor of answering the above kinds of questions—determining how to resolve the “shoulds” in them—is itself value-laden, and also self-referential/recursive, since the answer depends on our meta-values, which themselves are values to which the questions apply.
Doing that properly can get pretty complicated pretty fast, not least because doing so may require Tabooing “I/me” and dissecting the various constituent parts of one’s own mind down to a level where introspective access (and/or understanding of how one’s own brain works) becomes a bottleneck.[7]
But in conclusion: I’m pretty sure that simply following the most straightforward interpretation of
would probably lead to doing some kind of violence to one’s own values, to gradually corrupting[8] oneself, possibly without ever realizing it or feeling bad at any point. The probable default being “might makes right” / letting the more obvious-to-S1 values eat up ever more of one’s soul, at the expense of one’s more abstract values.
Addendum: I’d maybe replace
with
or, the models/advice many readers might (more or less (in)correctly) construe from this post
Examples of abstract values: “being logically consistent”, “being open-minded/non-parochial”, “bite philosophical bullets”, “take ideas seriously”, “value minds independently of the substrate they’re running on”.
To give one example: Acting without adequately accounting for scope insensitivity.
Because S1 yumminess-detectors don’t grok the S2 reasoning required to understand that a goals scores highly according to the abstract value, so pursuing the goal feels unrewarding.
Example: wanting heroin, vs wanting to not want heroin.
Depends on (i.a.) the extent to which we value “being the kind of person I would be if my brain weren’t so computationally limited/stupid”, I guess.
IME. YMMV.
as judged by a more careful, reflective, and less computationally limited extrapolation of one’s current values
Think you’re talking about ethics here… and if so why not call it that? Human Values (vs Ethics) is an unnecessary rejection that I don’t really believe is moving things forward in working in AI, Safety… and drum roll … ethics.
What you’ve pointed out in this article is a central concern of meta ethics. If your eluding to the fact this stuff is hard then… great. If it’s useful to how this fits in with our technologies then please specify how, so we can drive a proper critique.