“Our Values are (roughly) the yumminess or yearning…”
and
“Goodness is (roughly) whatever stuff the memes say one should value.”
but the post treats these as more separable than they actually are from the standpoint of how the brain acquires preferences.
You emphasize that
“we mostly don’t get to choose what triggers yumminess/yearning”
and that Goodness trying to overwrite that is “silly.” Yet a few paragraphs later you note that
“a nontrivial chunk of the memetic egregore Goodness needs to be complied with…”
before recommending to “jettison the memetic egregore” once the safety-function parts are removed.
But the brain’s value-learning machinery doesn’t respect this separation. “Yumminess/yearning” is not fixed hardware; it’s a constantly updated reward model trained by social feedback, imitation, and narrative framing. The very things you group under “Goodness” supply the majority of training data for what later becomes “actual Values.” The egregore is not only a coordination layer or a memetically selected structure on top, it is also the training signal.
Your own example shows this coupling. You say that
“Loving Connection… is a REALLY big chunk of their Values”
while also being a core part of Goodness. This dual function of a learned reward target and the memetic structure that teaches people to want it, is typical rather than exceptional.
So the key point isn’t “should you follow Goodness or your Values?” but “which training signals should you expose your value-learning architecture to?” Then the Albert failure mode looks less like “he ignored Goodness” and more like “he removed a large portion of what shapes his future reward landscape.”
And for societies, given that values are learned, the question becomes which parts of Goodness should we deliberately keep because they stabilize or improve the learning process, not merely because they protect cooperation equilibria?
I like the sharp distinction you draw between
and
but the post treats these as more separable than they actually are from the standpoint of how the brain acquires preferences.
You emphasize that
and that Goodness trying to overwrite that is “silly.” Yet a few paragraphs later you note that
before recommending to “jettison the memetic egregore” once the safety-function parts are removed.
But the brain’s value-learning machinery doesn’t respect this separation. “Yumminess/yearning” is not fixed hardware; it’s a constantly updated reward model trained by social feedback, imitation, and narrative framing. The very things you group under “Goodness” supply the majority of training data for what later becomes “actual Values.” The egregore is not only a coordination layer or a memetically selected structure on top, it is also the training signal.
Your own example shows this coupling. You say that
while also being a core part of Goodness. This dual function of a learned reward target and the memetic structure that teaches people to want it, is typical rather than exceptional.
So the key point isn’t “should you follow Goodness or your Values?” but “which training signals should you expose your value-learning architecture to?” Then the Albert failure mode looks less like “he ignored Goodness” and more like “he removed a large portion of what shapes his future reward landscape.”
And for societies, given that values are learned, the question becomes which parts of Goodness should we deliberately keep because they stabilize or improve the learning process, not merely because they protect cooperation equilibria?