sunwillrise comments on A Conservative Vision For AI Alignment

sunwillrise 21 Aug 2025 19:55 UTC
12 points
1
But as we said, this is the first post in a series about our thinking on the topic, not a specific plan, much less final word on how things should happen.
That’s all fine and good if you plan on addressing these kinds of problems in future posts of your series/sequence and explain how you think it’s at all plausible for your vision to take hold. I look forward to seeing them.
I have a meta comment about this general pattern, however. It’s something that’s unfortunately quite recurrent on this site. Namely that an author posts on a topic, a commenter makes the most basic objection that jumps to mind first, and the author replies that the post isn’t meant to be the definitive word on the topic and the commenter’s objection will be addressed in future posts.^[1]
I think this pattern is bad and undesirable.^[2] Despite my many disagreements with him and his writing, Eliezer did something very, very valuable in the Sequences and then in Highly Advanced Epistemology. He started out with all the logical dependencies, hammering down the basics first, and then built everything else on top, one inferential step at a time.^[3] As a result of this, users could verify the local validity of what he was saying, and when they disagreed with him, they knew the precise point where they jumped off the boat of his ideology.^[4] Instead of Eliezer giving his conclusions without further commentary, he gave the commentary, bit by bit, then the conclusions.
1. ^
  In practice, it generally just isn’t. Or a far weaker or modified version of it is.
2. ^
  Which doesn’t mean there’s a plausible alternative out there in practice. Perhaps trying to remove this pattern imposes too much of a constraint on authors and instead of them writing things better (from my pov), they instead don’t write anything at all. Which is a strictly worse outcome than the original.
3. ^
  That’s not because his mind had everything cleanly organized in terms of axioms and deductions. It’s because he put in a lot of effort to translate what was in his head to what would be informative for and convincing to an audience.
4. ^
  Which allows for productive back-and-forths because you don’t need to thread through thousands of words to figure out where people’s intuitions differ and how much they disagree with, etc.
- Wei Dai 22 Aug 2025 1:02 UTC
  12 points
  4
  Parent
  I often have the opposite complaint, which is that when reading a sequence, I wish I knew what the authors’ bottom line is, so I can better understand how their arguments relate and which ones are actually important and worth paying attention to. If I find a flaw, does it actually affect their conclusions or is it just a nit? In this case, I wish I knew what the authors’ actual ideas are for aligning AI “conservatively”.
  
  One way to solve both of our complaints is if the authors posted the entire sequence at once, but I can think of some downsides to doing that (reducing reader motivation, lack of focus in discussion), so maybe still post to LW one at a time, but make the entire sequence available somewhere else for people to read ahead or reference if they want to?
  - Davidmanheim 22 Aug 2025 8:01 UTC
    6 points
    0
    Parent
    We’re very interested in seeing where people see flaws, and there’s a real chance that they could change our views. This is a forum post, not a book, and the format and our intent sharing it differs. That is, if we had completed the entire sequence before starting to get public feedback, the idea of sharing the full seuquence at the start would work—but we have not. We have ideas, partial drafts, and some thoughts on directions to pursue, but it’s not obvious that the problems we’re addressing are solvable, so we certainly don’t have final conclusions, nor do I think we will get there when we conclude the sequence.
  - sunwillrise 22 Aug 2025 4:09 UTC
    6 points
    2
    Parent
    One way to solve both of our complaints is if the authors posted the entire sequence at once, but I can think of some downsides to doing that (reducing reader motivation, lack of focus in discussion)
    Also the fact that you don’t get to use real-time feedback from readers on what their disagreements/confusions are, allowing you to change what’s in the sequence itself or to address these problems in future posts.
    Anyway, I don’t have a problem with authors making clear what their bottom line is.^[1] I have a problem with them arguing for their bottom line out of order, in ways that unintentionally but pathologically result in lingering confusions and disagreements and poor communication.
    ^
    If nothing else, reading that tells you as a reader whether it’s something you’re interested in hearing about or not, allowing you to not waste time needlessly if it’s the latter
- Davidmanheim 21 Aug 2025 20:09 UTC
  1 point
  −1
  Parent
  I’m confused by this criticism. You jumped on the most the most basic objection that jumps to mind first based on what you thought we were saying—but you were wrong. We said, explicitly, that this is “our lens on parts of the conservative-liberal conceptual conflict” and then said “In the next post, we want to outline what we see as a more workable version of humanity’s relationship with AGI moving forward.”
  
  My reply wasn’t backing out of a claim, it was clarifying the scope by restating and elaborating slightly something we already said in the very first section of the post!
  - sunwillrise 21 Aug 2025 20:49 UTC
    4 points
    0
    Parent
    The objection isn’t the liberal/conservative lens. That’s relatively minor, as I said. The objection is the viability of this approach, which I explained afterwards (in the final 4 paragraphs of my comment) and remains unaddressed.
    - Davidmanheim 22 Aug 2025 12:58 UTC
      4 points
      0
      Parent
      The viability of what approach, exactly? You again seem to be reading something different than what was written.
      
      You said “There is no point in this post where the authors present a sliver of evidence for why it’s possible to maintain the ‘barriers’ and norms that exist in current societies, when the fundamental phase change of the Singularity happens.”
      
      Did we make an argument that it was possible, somewhere, which I didn’t notice writing? Or can I present a conclusion to the piece that might be useful:
      
      ”...the question we should be asking now is where [this] view leads, and how it could be achieved.
      That is going to include working towards understanding what it means to align AI after embracing this conservative view, and seeing status and power as a feature, not a bug. But we don’t claim to have ‘the’ answer to the question, just thoughts in that direction—so we’d very much appreciate contributions, criticisms, and suggestions on what we should be thinking about, or what you think we are getting wrong.”