sunwillrise comments on A Conservative Vision For AI Alignment

sunwillrise 21 Aug 2025 19:00 UTC
32 points
0
This is a strange post to me.
On the one hand, it employs oversimplified and incorrect models of political discourse to present an inaccurate picture of what liberalism and conservatism stand for. It also strongly focuses on an analogy for AGI as humanity’s children, an analogy that I think is inappropriate and obscures far more than it reveals.
On the other hand, it gets a lot of critical details exactly right, such as when it mentions how “proposals like Eliezer’s Coherent Extrapolated Volition, or Bostrom’s ideas about Deep Utopia assume or require value convergence, and see idealization as desirable and tractable.”
But beyond these (in a relative sense) minute matters… to put it in Zvi’s words, the “conservative”^[1] view of “keep[ing] social rules as they are” simply doesn’t Feel the ASI.^[2] There is no point in this post where the authors present a sliver of evidence for why it’s possible to maintain the “barriers” and norms that exist in current societies, when the fundamental phase change of the Singularity happens.
The default result of the Singularity is that existing norms and rules are thrown out the window. Not because people suddenly stop wanting to employ them^[3], not because those communities rebel against the rules^[4], but simply because those who do not adapt get economically outcompeted and lose resources to those who do. You adapt, or you die. It’s the Laws of Economics, not the Laws of Man, that lead to this outcome.^[5] Focusing exclusively on the latter, as the post does, on how we ought to relate to each other and what moral norms we should employ, blah blah blah, is missing the forest for a (single) tree. It’s a distraction.
There are ways one can believe this outcome can be avoided, of course. If strong AGI never appears anytime soon, for example. If takeoff is very slow and carefully regulated to ensure society always reaches an equilibrium first before any new qualitative improvement in AI capabilities happens. If a singleton takes absolute control of the entire world and dictates by fiat that conservatism shall be allowed to flourish wherever people want it to, forcefully preventing anyone else from breaking barriers and obtaining ever-increasing resources by doing so.
I doubt the authors believe in the former two possibilities.^[6] If they do, they should say so. And if they believe in the latter, well… building an eternally unbeatable norm-enforcing God on earth is probably not what “conservatives” have in mind when they say barriers should be maintained and Schelling fences should be maintained and genuine disagreement should be allowed to exist.
1. ^
  Again, this isn’t what conservatism stands for. But I’ll try to restrain myself from digressing into straight-up politics too much
2. ^
  I’d even go a lot further and say it doesn’t feel… regular AI progress? Or even just regular economic progress in general. The arguments I give below apply, with lesser force of course, even if we have “business as usual” in the world. Because “business as usual,” throughout all of human history and especially at an ever-increasing pace in the past 250 years, means fundamental changes in norms and barriers and human relations despite conservatives standing athwart history, yelling stop
3. ^
  Except this does happen, because the promise of prosperity and novelty is a siren’s call too alluring to resist en masse
4. ^
  Except this also happens, if only because of AIs fundamentally altering human cognition, as is already starting to happen and will by default be taken up to eleven sometime soon
5. ^
  See The benevolence of the butcher for further discussion.
6. ^
  Which doesn’t mean these possibilities are wrong, mind you.
- Davidmanheim 21 Aug 2025 19:41 UTC
  6 points
  1
  Parent
  I won’t try to speak for my co-author, but yes, we agree that this doesn’t try to capture the variety of views that exist, much less what your view of political discourse should mean by conservatism—this is a conservative vision, not the conservative vision. And given that, we find the analogy to be useful in motivating our thinking and illustrating an important point, despite that fact that all analogies are inexact.
  
  That said, yes, I don’t “feel the AGI” in the sense that if you presume that the singularity will happen in the typically imagined way, humanity as we know it doesn’t make it. And with it goes any ability to preserve our current values. I certainly do “feel the AGI” in thinking that the default trajectory is pointed in that direction, and accelerating, and it’s not happening in any sense in a fashion that preserves any values whatsoever, conservative or otherwise.
  But that’s exactly the point—we don’t think that the AGI which is being aimed for is a good thing, and we do think that the conversation about the possible futures humanity could be aiming for is (weirdly) narrowly constrained to be either a pretty bland techno-utopianism, or human extinction. We certainly don’t think that it’s necessary for there to be no AGI, and we don’t think that eternal stasis is a viable conservative vision either, contrary to what you assume we meant. But as we said, this is the first post in a series about our thinking on the topic, not a specific plan, much less final word on how things should happen.
  - sunwillrise 21 Aug 2025 19:55 UTC
    12 points
    1
    Parent
    But as we said, this is the first post in a series about our thinking on the topic, not a specific plan, much less final word on how things should happen.
    That’s all fine and good if you plan on addressing these kinds of problems in future posts of your series/sequence and explain how you think it’s at all plausible for your vision to take hold. I look forward to seeing them.
    I have a meta comment about this general pattern, however. It’s something that’s unfortunately quite recurrent on this site. Namely that an author posts on a topic, a commenter makes the most basic objection that jumps to mind first, and the author replies that the post isn’t meant to be the definitive word on the topic and the commenter’s objection will be addressed in future posts.^[1]
    I think this pattern is bad and undesirable.^[2] Despite my many disagreements with him and his writing, Eliezer did something very, very valuable in the Sequences and then in Highly Advanced Epistemology. He started out with all the logical dependencies, hammering down the basics first, and then built everything else on top, one inferential step at a time.^[3] As a result of this, users could verify the local validity of what he was saying, and when they disagreed with him, they knew the precise point where they jumped off the boat of his ideology.^[4] Instead of Eliezer giving his conclusions without further commentary, he gave the commentary, bit by bit, then the conclusions.
    ^
    In practice, it generally just isn’t. Or a far weaker or modified version of it is.
    ^
    Which doesn’t mean there’s a plausible alternative out there in practice. Perhaps trying to remove this pattern imposes too much of a constraint on authors and instead of them writing things better (from my pov), they instead don’t write anything at all. Which is a strictly worse outcome than the original.
    ^
    That’s not because his mind had everything cleanly organized in terms of axioms and deductions. It’s because he put in a lot of effort to translate what was in his head to what would be informative for and convincing to an audience.
    ^
    Which allows for productive back-and-forths because you don’t need to thread through thousands of words to figure out where people’s intuitions differ and how much they disagree with, etc.
    - Wei Dai 22 Aug 2025 1:02 UTC
      12 points
      4
      Parent
      I often have the opposite complaint, which is that when reading a sequence, I wish I knew what the authors’ bottom line is, so I can better understand how their arguments relate and which ones are actually important and worth paying attention to. If I find a flaw, does it actually affect their conclusions or is it just a nit? In this case, I wish I knew what the authors’ actual ideas are for aligning AI “conservatively”.
      
      One way to solve both of our complaints is if the authors posted the entire sequence at once, but I can think of some downsides to doing that (reducing reader motivation, lack of focus in discussion), so maybe still post to LW one at a time, but make the entire sequence available somewhere else for people to read ahead or reference if they want to?
      - Davidmanheim 22 Aug 2025 8:01 UTC
        6 points
        0
        Parent
        We’re very interested in seeing where people see flaws, and there’s a real chance that they could change our views. This is a forum post, not a book, and the format and our intent sharing it differs. That is, if we had completed the entire sequence before starting to get public feedback, the idea of sharing the full seuquence at the start would work—but we have not. We have ideas, partial drafts, and some thoughts on directions to pursue, but it’s not obvious that the problems we’re addressing are solvable, so we certainly don’t have final conclusions, nor do I think we will get there when we conclude the sequence.
      - sunwillrise 22 Aug 2025 4:09 UTC
        6 points
        2
        Parent
        One way to solve both of our complaints is if the authors posted the entire sequence at once, but I can think of some downsides to doing that (reducing reader motivation, lack of focus in discussion)
        Also the fact that you don’t get to use real-time feedback from readers on what their disagreements/confusions are, allowing you to change what’s in the sequence itself or to address these problems in future posts.
        Anyway, I don’t have a problem with authors making clear what their bottom line is.^[1] I have a problem with them arguing for their bottom line out of order, in ways that unintentionally but pathologically result in lingering confusions and disagreements and poor communication.
        ^
        If nothing else, reading that tells you as a reader whether it’s something you’re interested in hearing about or not, allowing you to not waste time needlessly if it’s the latter
    - Davidmanheim 21 Aug 2025 20:09 UTC
      1 point
      −1
      Parent
      I’m confused by this criticism. You jumped on the most the most basic objection that jumps to mind first based on what you thought we were saying—but you were wrong. We said, explicitly, that this is “our lens on parts of the conservative-liberal conceptual conflict” and then said “In the next post, we want to outline what we see as a more workable version of humanity’s relationship with AGI moving forward.”
      
      My reply wasn’t backing out of a claim, it was clarifying the scope by restating and elaborating slightly something we already said in the very first section of the post!
      - sunwillrise 21 Aug 2025 20:49 UTC
        4 points
        0
        Parent
        The objection isn’t the liberal/conservative lens. That’s relatively minor, as I said. The objection is the viability of this approach, which I explained afterwards (in the final 4 paragraphs of my comment) and remains unaddressed.
        Davidmanheim 22 Aug 2025 12:58 UTC
        4 points
        0
        Parent
        The viability of what approach, exactly? You again seem to be reading something different than what was written.
        
        You said “There is no point in this post where the authors present a sliver of evidence for why it’s possible to maintain the ‘barriers’ and norms that exist in current societies, when the fundamental phase change of the Singularity happens.”
        
        Did we make an argument that it was possible, somewhere, which I didn’t notice writing? Or can I present a conclusion to the piece that might be useful:
        
        ”...the question we should be asking now is where [this] view leads, and how it could be achieved.
        That is going to include working towards understanding what it means to align AI after embracing this conservative view, and seeing status and power as a feature, not a bug. But we don’t claim to have ‘the’ answer to the question, just thoughts in that direction—so we’d very much appreciate contributions, criticisms, and suggestions on what we should be thinking about, or what you think we are getting wrong.”