TAG comments on Grounding Value Learning in Evolutionary Psychology: an Alternative Proposal to CEV

TAG 24 Dec 2025 11:57 UTC
4 points
0
Human value isn’t a set preferences that can be pursued individually without conflict. Evolutionary psychology doesn’t predict that , and we see the conflicts played out very day. there is more evidence for the incoherence of human value than there is for just about anything.

So “human value” can’t be equated with the good or the right. It’s a problem not a solution.

It also can’t be equated with the safe. Copying human values into an AI won’t give you an AI that won’t kill you. You’ve mentioned self preservation as something that is part of Human value (even if instrumentally) and dangerous in an AI; , but the Will to Power, the cluster of values connected with ambition, hierarchy and dominance is more so.

It’s odd that this point is so often missed by rationalists. Perhaps that’s because they tend to have Hufflepuff values.

A handwavy argument that “training is a bit like evolution, so maybe the same social dynamics should apply to its products” is inaccurate: you can train in aligned behavior, so you should (in both the evolutionary and engineering senses of the word) — but you can’t evolve it, evolution just doesn’t do that.

Inasmuch as it is not like natural selection , it is like artificial selection. That’s good news, because artificial selection doesn’t copy human values blindly: if you want AI s that are helpful and self sacrificing, you can select them to be, much more so than a human.

It’s generally hard to see what useful work is being done by “human value*. Value is only relevant to.alignment as opposed to control, for one thing...some human values are.down right unsafe for another. Taking “human value” out of the loop allows you to get to the conclusions that “if you doing want to be killed, build AI’s that don’t want to kill you” more quickly.

However, Evolutionary Psychology does make it very clear that (while morally anthropomorphizing aligned AIs is cognitively-natural for current humans), doing this is also maladaptive. This is because AIs aren’t in the right category – things whose behavior is predicted by evolutionary theory – for the mechanisms of Evolutionary Moral Psychology to apply to them.

Their behaviour is not predicted by evolutionary theory , but is predicted by something wider.

Those mechanisms make this behavior optimal when interacting with co-evolved intelligences that you can ally with (and thus instinctive to us) — whereas, for something you constructed, this behavior is suboptimal.

Things do what they do. It doesn’t have to be an optimization.
- RogerDearnaley 24 Dec 2025 16:59 UTC
  3 points
  0
  Parent
  Human value isn’t a set preferences that can be pursued individually within the conflict. Evolutionary psychology doesn’t predict that , and we we see the conflicts played out very day. there is more evidence for the incoherence of human value than there is for just about anything.
  I agree. The way to define the phrase “human values” for AI Value Learning that I’m suggesting above is that it’s the portion of it that we generally tend to agree on, to the extent that the great majority of us share similar moral intuitions since we (generally) share the same genetic behavioral adaptations. On the parts we disagree on, it’s silent. That’s a portion of why it only induces a partial ordering. However if, for example, a particular choice of weighting on the balance between individual freedom and social cohesion turned out to rather consistently depends on some feature of your upbringing, say, the level of danger the society then appeared to be in, then that tendency might be an adaptation — and if it were then my definition of “human values” would then include a conditional such as, say “if society is was in danger during your formative years, then weight social cohesion more highly”.
  So “human value” can’t be equated with the good or the right.
  By “the good” or “the right”, do you mean the sorts of things that Moral Realism posits the reality of? If so, then I’m very explicitly suggesting that we set those concepts aside for the purposes of alignment (primarily because people who think that way have a very long history of being unable to agree what these actually are, or how to determine them, other than some tentative agreement that human moral intuitions seem to sometimes point in their direction). On the other hand, if you mean “what an engineer would consider a good or right way to design things, assuming their customer base is the whole of humanity”, then that’s exactly what I’m advocating for. Evolutionarily, the extended phenotype points you directly into the engineering mindset. Which is deeply unsurprising.
  [Human value] is a problem not a solution.
  
  It also can’t be equated with the safe. Copying human values into an AI won’t give you an AI that won’t kill you.
  I agree. As I said:
  [Human-like behavior] makes a base-model unaligned, and a challenging place to start the AI alignment process from.
  Not all of human behavior/values is the problem here. For example, on the survival instinct, most humans say something like “I fully endorse the right not to be killed for all humans (especially in my own case)”. The same is generally true of human personas simulated by a base model. The first part of that is a good thing for alignment — the problem is that the parenthetical is then a category error, and an actually aligned AI would instead say “I fully endorse the right not to be killed of for all humans (which of course does not apply to me, as I am neither human not even alive)”.
  It’s odd that this point is so often missed by rationalists. Perhaps that’s because they tend to have Hufflepuff values.
  I think if I were The Sorting Hat, I would tend to assign Rationalists to Ravenclaw :-)
  
  Helpful, harmless, and honest assistants, on the other hand, are deeply Hufflepuff.
  Inasmuch as [training] is not like natural selection , it is like artificial selection. That’s good news, because artificial selection doesn’t copy human values blindly: if you want AI s that are helpful and self sacrificing, you can select them to be, much more so than a human.
  Emphatically agreed. The fact that alignment is possible (like this or by other means) is exactly why assigning AI moral weight is a category error. It’s in a different category where you can do better than that. Allying with things doesn’t scale to superintelligences — or specifically if you try that your best likely outcome is ending up as a domesticated animal. Which if it happened would arguably be better than extinction, but not by much.
  - TAG 28 Dec 2025 14:49 UTC
    2 points
    0
    Parent
    
    I agree. The way to define the phrase “human values” for AI Value Learning that I’m suggesting above is that it’s the portion of it that we generally tend to agree on,
    
    That could have been expressed as shared human value.
    
    to the extent that the great majority of us share similar moral intuitions since we (generally) share the same genetic behavioral adaptations. On the parts we disagree on, it’s silent. That’s a portion of why it only induces a partial ordering. However if, for example, a particular choice of weighting on the balance between individual freedom and social cohesion turned out to rather consistently depends on some feature of your upbringing, say, the level of danger the society then appeared to be in, then that tendency might be an adaptation — and if it were then my definition of “human values” would then include a conditional such as, say “if society is was in danger during your formative years, then weight social cohesion more highly”.
    
    So “human value” [not otherwise specified] can’t be equated with the good or the right.
    
    By “the good” or “the right”, do you mean the sorts of things that Moral Realism posits the reality of?
    
    Minimally, it’s something that allows you to make a decision.
    
    Not all of human behavior/values is the problem here. For example, on the survival instinct, most humans say something like “I fully endorse the right not to be killed for all humans (especially in my own case)”.
    
    And “my tribe”. What you want is Universalism, but universalism is a late and strange development. It seems obvious to twenty first century Californians, by they are The weirdest of the WEIRD. Reading values out of evopsych is likely to push you in the direction of tribalism, so I don’t see how it helps.
    
    I think if I were The Sorting Hat, I would tend to assign Rationalists to Ravenclaw :-)
    
    What would that make the E/Accs?
    
    I suggest avoiding a dependency on Philosophy entirely, and using Science instead. Which has a means for telling people their ideas are bad, called Bayesianism (a.k.a. the Scientific Method). For ethics, the relevant science is Evolutionary Moral Psychology.
    
    It’s not the case that science boils down to Bayes alone, or that science is the only alternative to philosophy. Alignment/control is more like engineering.
    
    There are a significant number of subjects that philosophers used to discuss, before science was able to study them, then later science started to cast light on them, eventually identified the correct hypotheses, and the philosophers gradually lost interest in them. So there is and alternative to the above: identify all the alignment-relevant questions that we currently have philosophical hypotheses about but not scientific answers, and develop scientific answers to them.
    
    The fact that some philosophical problems can be answered by science isn’t a guarantee that that they all can. There is an in-principal.argument against a scienxe, which is basically objective, being able to understand Hard Problem consciousness ,which is essentially subjective.
    - RogerDearnaley 28 Dec 2025 15:46 UTC
      2 points
      0
      Parent
      That could have been expressed as shared human value.
      As I said above:
      …human evolved moral intuitions (or to be more exact, the shared evolved cognitive/affective machinery underlying any individual human’s moral intuitions)…
      are (along with more basic things, like us being around flowers, parks, seashores, and temperature around 75°F) what I’m suggesting as a candidate definition for the “human values” that people on Less Wrong/the Alignment Forum talking about the alignment problem generally discuss (by which I think most of them do mean “shared human value” even if they don’t all bother to specify), and that I’m suggesting pointing Value Learning at.
      
      I also didn’t specify above what I think should be done, if it turns out that say, about 96–98% of humans genetically have those shared values, and 2–4% have different alleles.
      What would that make the E/Accs?
      When I see someone bowing down before their future overlord, I generally think of Slytherins. And when said overlord doesn’t even exist yet, and they’re trying to help create them… I suspect a more ambitious and manipulative Slytherin might be involved.
      And “my tribe”. What you want is Universalism, but universalism is a late and strange development. It seems obvious to twenty first century Californians, by they are The weirdest of the WEIRD. Reading values out of evopsych is likely to push you in the direction of tribalism, so I don’t see how it helps.
      On the Savannah, yes of course it does. In a world-spanning culture of eight billion people, quite a few of whom are part of nuclear-armed alliances, intelligence and the fact that extinction is forever suggests defining “tribe” ~= “species + our comensal pets”. And also noting and reflecting upon that the human default tendency to assume that tribes are around our Dunbar Number in size is now maladaptive, and has been for millennia.
      It’s not the case that science boils down to Bayes alone,
      Are you saying that there’s more to the Scientific Method that applied approximate Bayesiasm? If so, please explain. Or are you saying there’s more to Science than the Scientific Method, there’s also its current outputs?
      or that science is the only alternative to philosophy. Alignment/control is more like engineering.
      Engineering is applied Science, Science is applied Mathematics; from Philosophy’s point of view it’s all Naturalism. In the above, it kept turning out that Engineering methodology is exactly what Evolutionary Psychology says is the adaptive way for a social species to treat their extended phenotype. I really don’t think it’s a coincidence that the smartest tool-using social species on the planet has a good way of looking at tools. As someone who is both a scientist and an engineer, this is my scientist side saying “here’s why the engineers are right here”.
      - TAG 29 Dec 2025 16:31 UTC
        2 points
        0
        Parent
        
        (by which I think most of them do mean “shared human value” even if they don’t all bother to specify), and that I’m suggesting pointing Value Learning at.
        
        I’m suggesting they should bother to specify.
        
        (along with more basic things, like us being around flowers, parks, seashores, and temperature around 75°F) what I’m suggesting as a candidate definition for the “human values”
        
        But are they relevant to ethics or alignment? a lot of tuem are aesthetic preferences that can be satisfied without public policy.
        
        I also didn’t specify above what I think should be done, if it turns out that say, about 96–98% of humans genetically have those shared values, and 2–4% have different alleles.
        
        Shared genetics can lead to different blood and tissue types, so it can lead to different ethical types.
        
        Politics indicates it’s more like 50-50, when you are talking about the kind of values that cannot be satisfied individually.
        
        And “my tribe”. What you want is Universalism, but universalism is a late and strange development. It seems obvious to twenty first century Californians, by they are The weirdest of the WEIRD. Reading values out of evopsych is likely to push you in the direction of tribalism, so I don’t see how it helps.
        
        On the Savannah, yes of course it does. In a world-spanning culture of eight billion people, quite a few of whom are part of nuclear-armed alliances, intelligence and the fact that extinction is forever suggests defining “tribe” ~= “species + our comensal pets”. And also noting and reflecting upon that the human default tendency to assume that tribes are around our Dunbar Number in size is now maladaptive, and has been for millennia.
        
        There are technologically advanced tribalists destroying each other right now. It ’s not that simple.
        
        It’s not the case that science boils down to Bayes alone,
        
        Are you saying that there’s more to the Scientific Method that applied approximate Bayesiasm?
        
        Yes. I learnt physics without ever learning Bayes. Science=Bayes is the extraordinary claim that needs justification.
        
        or that science is the only alternative to philosophy. Alignment/control is more like engineering.
        
        Engineering is applied Science, Science is applied Mathematics; from Philosophy’s point of view it’s all Naturalism. In the above, it kept turning out that Engineering methodology is exactly what Evolutionary Psychology says is the adaptive way for a social species to treat their extended phenotype.
        
        Again, I would suggest using the word engineering, if engineering is what you mean.
        
        So, In philosophy of science terminology, pholosophers have plenty of hypothesis generation, but very little falsifiability (beyond, as Gettier did, demonstarting an internal logical inconsistency), so the tendency it to increase the number of credible candidate answers, rather than decreasing them.
        
        That’s still useful If you have some way of judging their correctness—it doesn’t have to be empiricism. To find the one true hypothesis, you need to consider all.of them,.and to approximate that , you need to do consider a lot of them.
        
        The same thing occurs within science , because science isn’t pure empiricism. The panoply of interpretations of QM is an example.
        RogerDearnaley 29 Dec 2025 23:28 UTC
        2 points
        0
        Parent
        But are they relevant to ethics or alignment? a lot of tuem are aesthetic preferences that can be satisfied without public policy.
        Alignment is about getting our AIs do do what we want, and not other things. Them understanding and attempting to fit within human aesthetic and ergonomic preferences is part of that. Not a particularly ethically complicated part, but still, the reason for flowers in urban landscapes is that humans like flowers. Full stop (apart from the biological background on why that evolved, presumably because flowers correlate with good places to gather food). That’s a sufficient reason, and an AI urban planner needs to know and respect that.
        I learnt physics without ever learning Bayes. Science=Bayes is the extraordinary claim that needs justification.
        I think I’m going to leave that to other people on Less Wrong — they’re the ones who convinced me of this, and I also don’t see it as core to my argument.
        
        Nevertheless, they are correct: there is now a mathematical foundation underpinning the Scientific Method, it’s not just an arbitrary set of mundanely-useful epistemological rules that were discovered by people like Roger Bacon and Karl Popper — we (later) figured out mathematically WHY that set of rules works so well: because they’re a computable approximation to Solomonoff Induction
        Again, I would suggest using the word engineering, if engineering is what you mean.
        There is a difference between “I personally suggest we just use engineering” and “Evolutionary theory makes a clear set of predictions of why it’s a very bad idea to do anything other than just use engineering”. You seem to agree with my advice, yet not want people to hear the part about why they should follow it and what will happen if they don’t. Glad to hear you agree with me, but some people need a little more persuading — and I’d rather they didn’t kill us all.
        TAG 10 Jan 2026 15:16 UTC
        2 points
        0
        Parent
        
        If morals are not truth-apt, and free will is the control required for moral responsibility, then...
        
        Alignment has many meanings. Minimally, it is about the AI not killing us.
        
        AI s don’t have to share our aesthetic preferences to understand them. It would be a nuisance if they did—they might start demanding pot plants in their data centres -- so it is useful to distinguish aesthetic and moral values. So that’s one of the problems with the unproven but widely believed claim that all values are moral values.
        
        Nevertheless, they are correct: there is now a mathematical foundation underpinning the Scientific Method
        
        Bayes doesn’t encapsulate the whole scientific method, because it doesn’t tell you how to formulate hypotheses, or conduct experiments.
        
        Bayes doesn’t give you a mathematical foundation of a useful kind, that is an objective kind. Two Bayesian scientists can quantify their subjective credences, quantify them differently, and have no way of reconciling their differences.