Davidmanheim comments on A Conservative Vision For AI Alignment

Davidmanheim 23 Aug 2025 18:49 UTC
3 points
0
Thank you for noticing the raft of reflexive downvotes; it’s disappointing how much even Lesswrong seems to react reflexively; even the comments seem not to have read the piece, or at least engaged with the arguments.

On your response—I agree that CEV as a process could arrive at the outcomes you’re describing, where ineliminable conflict gets it to throw an error—but think that CEV as approximated and as people assume will work is, as you note, making a prediction that disagreements will dissolve. Not only that, but it asserts that this will have an outcome that preserves what we value. If the tenets of agonism are correct, however, any solution geared towards “efficiently resolving conflict” is destructive of human values—because as we said, “conflict is central to the way society works, not something to overcome.” Still, I agree that Eliezer got parts of this right (a decade before almost anyone else even noticed the problem,) and agree that keeping things as multiplayer games with complex novelty, where conflict still matters is critical. The further point, which I think Eliezer’s fun theory, as written, kind of elides, is that we also need limits and pain for the conflict to matter. That is, again, it seems possible that part of what makes things meaningful is that we need to ourselves engage in the conflict, instead of having it “solved” via extrapolation of our values.

As a separate point, I argued in a different post, we lack the conceptual understanding needed to deal with the question of whether there is some extrapolated version of most agents that is anywhere “close” to their values which is coherent. But at the very least, “the odds that an arbitrary complex system is pursuing some coherent outcome” approaches zero, and that at least slightly implies almost all agents might not be “close” to a rational agent in the important senses we care about for CEV.
- Raemon 23 Aug 2025 19:18 UTC
  2 points
  2
  Parent
  The further point, which I think Eliezer’s fun theory, as written, kind of elides, is that we also need limits and pain for the conflict to matter.
  I think Eliezer writing says this sort of thing pretty explicitly? (Like, in Three Worlds Collide, the “bad” ending was the one where humans removed all conflict, romantic struggle, and similar types of pain that seem like the sort of thing you’re talking about here)
  If the tenets of agonism are correct, however, any solution geared towards “efficiently resolving conflict” is destructive of human value
  I assume this will come up later in your sequence, but, as stated this seems way too strong. (I can totally buy that there are qualities of conflict resolution that would be bad to abstract away, but, as stated this is an argument against democracy, markets, mediation, norms for negotiation, etc. Do you actually believe those are destructive of human value and we should be, like, waging war instead of talking? Or do you mean something else here)
  - Davidmanheim 23 Aug 2025 20:15 UTC
    2 points
    0
    Parent
    I agree that Eliezer has made different points different places, and don’t think that the Fun Theory series makes this clear, and CEV as described seems to not say it. (I can’t try to resolve all the internal tensions between the multiple bookshelves woth of content he’s produced, so I referred to “fun theory, as written.”)
    
    And I certainly don’t think conflict as such is good! (I’ve written about the benefits of avoiding conflict at some length on my substack about cooperation.) My point here was subtly different, and more specific to CEV; I think that solutions for eliminating conflict which route around humans themselves solving the problems might be fundamentally destructive of our values.