Julian Bradshaw comments on Moral Alignment: An Idea I’m Embarrassed I Didn’t Think of Myself

Julian Bradshaw 18 Jun 2025 19:21 UTC
2 points
0
I think you’ve misunderstood what I said? I agree that a human CEV would accord some moral status to animals, maybe even a lot of moral status. What I’m talking about is “primary sources of values” for the CEV, or rather, what population is the AI implementing the Coherent Extrapolated Volition of? Normally we assume it’s humanity, but OP is essentially proposing that the CEV be for “all beings everywhere”, including animals/aliens/AIs/plants/whatever.
- MichaelDickens 18 Jun 2025 19:26 UTC
  2 points
  0
  Parent
  I think we are on the same page, I was trying to agree with what you said and add commentary on why I’m concerned about “CEV with humans as the primary source of values”. Although I was only responding to your first paragraph not your second paragraph. I think your second paragraph also raises fair concerns about what a “CEV for all sentient beings” looks like.
- sunwillrise 18 Jun 2025 19:27 UTC
  1 point
  −7
  Parent
  It seems likely enough to me (for a ton of reasons, most of them enunciated here) that “the CEV of an individual human” doesn’t really make sense as a concept, let alone “the CEV of humanity” or even more broadly “the CEV of all beings everywhere.”
  More directly though, the Orthogonality Thesis alone is sufficient to make “the CEV of all beings everywhere” a complete non-starter unless there are so few other kinds of beings out there that “the CEV of humanity” would likely be a good enough approximation of it anyway (if it actually existed, which I think it doesn’t).
  - Julian Bradshaw 18 Jun 2025 20:02 UTC
    2 points
    0
    Parent
    I admit:
    Human preferences don’t fully cohere, especially when extrapolated
    There are many ways in which “Humanity’s CEV” is fuzzy or potentially even impossible to fully specify
    But I think the concept has staying power because it points to a practical idea of “the AI acts in a way such that most humans think it mostly shares their core values”.^[1] LLMs already aren’t far from this bar with their day-to-day behavior, so it doesn’t seem obviously impossible.
    To go back to agreeing with you, yes, adding new types of beings as primary sources of values to the CEV would introduce far more conflicting sets of preferences, maybe to the point that trying to combine them would be totally incoherent. (predator vs. prey examples, parasites, species competing for the same niche, etc etc.) That’s a strong objection to the “all beings everywhere” idea. It’d certainly be simpler to enforce human preferences on animals.
    ^
    I think of this as meaning the AI isn’t enforcing niche values (“everyone now has to wear Mormon undergarments in order to save their eternal soul”), is not taking obviously horrible actions (“time to unleash the Terminators!”), and is taking some obviously good actions (“I will save the life of this 3-year-old with cancer”). Obviously it would have to be neutral on a lot of things, but there’s quite a lot most humans have in common.