Raemon comments on A Conservative Vision For AI Alignment

Raemon 22 Aug 2025 20:59 UTC
24 points
33
I strong upvoted this because I thought the discussion was an interesting direction and it had already fallen off the frontpage. I don’t know that I particularly agree with the reasoning. (I am generally liberal, though have updated a bit towards being more conservative-as-described-here on the margin in recent years. I found the general framing of liberal vs conservative as described here an interesting as a lens to look through)
I do feel like this somewhat overstates the values-difference with Fun Theory, and feels like it’s missing the point of Coherent Extrapolated Volition.
We will argue that as usually presented, alignment by default leads to recursive preference engines that eliminate disagreement and conflict, creating modular, adaptable cultures where personal compromise is unnecessary. We worry that this comes at the cost of reducing status to cosmetics and eroding personal growth and human values. Therefore, we argue that it’s good that values inherently conflict, and these tensions give life meaning; AGI should support enduring human institutions by helping communities navigate disputes and maintain norms, channeling conflict rather than erasing it. This ideal, if embraced, means that AI Alignment is essentially a conservative movement.
I don’t think CEV at it’s core assumes this. I think that, while writing CEV makes a prediction that, if people knew more, thought longer, and grew up together more, a lot of disagreements would melt away and there would turn out to be a lot that humanity wants in common. But, CEV is designed to do pretty well even in worlds where that is false (if it’s maximally false, the CEV just throws an error. But in worlds where things only partially cohere, well, the AI helps out with those parts as best it can in a way that everyone agrees is good.
There also nothing intrinsically anti-conservative about what it’ll end up with, unless you think people would be less conservative after thinking longer and learning more and talking with each other more. (do you think that?). Yeah, lots of LWers probably lean towards expecting it’ll be more liberal, but, that’s just a prediction, not a normative claim CEV is making.
Somewhat relatedly, in Free to Optimize (which is about humans being able to go about steering their lives, not about an AI or anyone “hardcore optimizing”) Eliezer says:
If there is anything in the world that resembles a god, people will try to pray to it. It’s human nature to such an extent that people will pray even if there aren’t any gods—so you can imagine what would happen if there were! But people don’t pray to gravity to ignore their airplanes, because it is understood how gravity works, and it is understood that gravity doesn’t adapt itself to the needs of individuals. Instead they understand gravity and try to turn it to their own purposes.
So one possible way of helping—which may or may not be the best way of helping—would be the gift of a world that works on improved rules, where the rules are stable and understandable enough that people can manipulate them and optimize their own futures together. A nicer place to live, but free of meddling gods beyond that. I have yet to think of a form of help that is less poisonous to human beings—but I am only human.
This feels more like a vision of “what constraints to put in place” than “what to optimize for.”
(I agree that it has a vibe of pointing in a more individualistic direction, and it’s worth noticing that and not taking it for granted. But I think the point of Fun Theory is to get at something that really would also underly any good vision for the future, not just one particular one. I think conservatives do actually want complex novelty. I don’t have encyclopedic memory of the Fun Theory sequence but I would bet against it saying anything explicit and probably not even anything implicit about “individual” complex novelty. It even specifically warns against turning our complex, meaningful multiplayer games into single-player experiences)
CEV isn’t about eliminating conflict, it’s (kinda) about efficiently resolving conflict. But, insofar as the resolution of the conflict itself is meaningful, it doesn’t say anything about people not getting to resolve the conflict themselves.
- Raemon 22 Aug 2025 21:34 UTC
  5 points
  2
  Parent
  (People seem to be hella downvoting this, and I am kinda confused as to why. I can see not finding it particularly persuasive or interesting. I’m guessing this is just sad tribalism but curious if people have a particular objection I’m missing)
  - habryka 22 Aug 2025 22:50 UTC
    4 points
    0
    Parent
    There are some users around who strong-downvote anyone trying to make any arguments on the basis of CEV, and who seem very triggered by the concept. This is sad and has derailed a bunch of conversations in the past. My guess is the same is going on here.
    - Adele Lopez 23 Aug 2025 19:29 UTC
      2 points
      0
      Parent
      Do you not have the power/tools to stop such behavior from taking effect? This sounds like the exact problem that killed LW 1.0, and which I was lead to believe is now solved.
      - habryka 23 Aug 2025 19:43 UTC
        6 points
        0
        Parent
        We have much better tools to detect downvoting of specific users, and unusual voting activity by a specific user, but if a topic only comes up occasionally and the users who vote on that topic also regularly vote on other things, I don’t know of any high-level statistics that would easily detect that, and I think it would have very substantial chilling effects if we were to start policing that kind of behavior.
        There probably are technical solutions, but it’s a more tricky kind of problem than what LW 1.0 faced, and we haven’t built them.
        Davidmanheim 24 Aug 2025 7:25 UTC
        2 points
        0
        Parent
        I’d be more interested in tools that detected downvotes that occur before people started reading, on the basis of the title—because I’d give even odds that more than half of downvotes on this post were within 1 minute of opening it, on the basis of the title or reacting the the first paragraph—not due to the discussion of CEV.
  - Noosphere89 22 Aug 2025 21:47 UTC
    −2 points
    −12
    Parent
    I was the one who downvoted, and my reasoning for doing this is at a fundamental level, I think a lot of their argument rests on fabrication of options that only appear to work because they ignore the issue of why value disagreement is less tolerable in an AI-controlled future than now.
    I have a longer comment below, and @sunwillrise makes a similar point, but a lot of the argument around AI safety having an attitude towards minimizing value conflict makes more sense than the post is giving it credit for, and the mechanisms that allow value disagreements to not blow up into take-over attempts/mass violence relies on certain features of modern society that AGI will break (and there is no talk about how to actually make the vision sustainable):
    https://www.lesswrong.com/posts/iJzDm6h5a2CK9etYZ/a-conservative-vision-for-ai-alignment#eBdRwtZeJqJkKt2hn
- Davidmanheim 23 Aug 2025 18:49 UTC
  3 points
  0
  Parent
  Thank you for noticing the raft of reflexive downvotes; it’s disappointing how much even Lesswrong seems to react reflexively; even the comments seem not to have read the piece, or at least engaged with the arguments.
  
  On your response—I agree that CEV as a process could arrive at the outcomes you’re describing, where ineliminable conflict gets it to throw an error—but think that CEV as approximated and as people assume will work is, as you note, making a prediction that disagreements will dissolve. Not only that, but it asserts that this will have an outcome that preserves what we value. If the tenets of agonism are correct, however, any solution geared towards “efficiently resolving conflict” is destructive of human values—because as we said, “conflict is central to the way society works, not something to overcome.” Still, I agree that Eliezer got parts of this right (a decade before almost anyone else even noticed the problem,) and agree that keeping things as multiplayer games with complex novelty, where conflict still matters is critical. The further point, which I think Eliezer’s fun theory, as written, kind of elides, is that we also need limits and pain for the conflict to matter. That is, again, it seems possible that part of what makes things meaningful is that we need to ourselves engage in the conflict, instead of having it “solved” via extrapolation of our values.
  
  As a separate point, I argued in a different post, we lack the conceptual understanding needed to deal with the question of whether there is some extrapolated version of most agents that is anywhere “close” to their values which is coherent. But at the very least, “the odds that an arbitrary complex system is pursuing some coherent outcome” approaches zero, and that at least slightly implies almost all agents might not be “close” to a rational agent in the important senses we care about for CEV.
  - Raemon 23 Aug 2025 19:18 UTC
    2 points
    2
    Parent
    The further point, which I think Eliezer’s fun theory, as written, kind of elides, is that we also need limits and pain for the conflict to matter.
    I think Eliezer writing says this sort of thing pretty explicitly? (Like, in Three Worlds Collide, the “bad” ending was the one where humans removed all conflict, romantic struggle, and similar types of pain that seem like the sort of thing you’re talking about here)
    If the tenets of agonism are correct, however, any solution geared towards “efficiently resolving conflict” is destructive of human value
    I assume this will come up later in your sequence, but, as stated this seems way too strong. (I can totally buy that there are qualities of conflict resolution that would be bad to abstract away, but, as stated this is an argument against democracy, markets, mediation, norms for negotiation, etc. Do you actually believe those are destructive of human value and we should be, like, waging war instead of talking? Or do you mean something else here)
    - Davidmanheim 23 Aug 2025 20:15 UTC
      2 points
      0
      Parent
      I agree that Eliezer has made different points different places, and don’t think that the Fun Theory series makes this clear, and CEV as described seems to not say it. (I can’t try to resolve all the internal tensions between the multiple bookshelves woth of content he’s produced, so I referred to “fun theory, as written.”)
      
      And I certainly don’t think conflict as such is good! (I’ve written about the benefits of avoiding conflict at some length on my substack about cooperation.) My point here was subtly different, and more specific to CEV; I think that solutions for eliminating conflict which route around humans themselves solving the problems might be fundamentally destructive of our values.