jacquesthibs comments on Alignment remains a hard, unsolved problem

jacquesthibs 27 Nov 2025 20:24 UTC
LW: 9 AF: 2
1
AF
This comment had a lot of people downvote it (at this time, 2 overall karma with 19 votes). It shouldn’t have been, and I personally believe this is a sign of people being attached to AI x-risk ideas and of those ideas contributing to their entire persona rather than strict disagreement. This is something I bring to conversations about AI risk, since I believe folks will post-rationalize. The above comment is not low effort or low value.
If you disagree so strongly with the above comment, you should force yourself to outline your views and provide a rebuttal to the series of points made. I would personally value comments that attempted to do this in earnest. Particularly because I don’t want this post by Evan to be a signpost for folks to justify their belief in AI risk and essentially have the internal unconscious thinking of, “oh thank goodness someone pointed out all the AI risk issues, so I don’t have to do the work of reflecting on my career/beliefs and I can just defer to high status individuals to provide the reasoning for me.” I sometimes feel that some posts just end further discussion because they impact one’s identity.
That said, I’m so glad this post was put out so quickly so that we can continue to dig into things and disentangle the current state of AI safety.
Note: I also think Adrià should have been acknowledged in the post for having inspired it.
- evhub 27 Nov 2025 21:10 UTC
  LW: 10 AF: 6
  0
  AF Parent
  I thought Adria’s comment was great and I’ll try to respond to it in more detail later if I can find the time (edit: that response is here), but:
  
  Note: I also think Adrià should have been acknowledged in the post for having inspired it.
  
  Adria did not inspire this post; this is an adaptation of something I wrote internally at Anthropic about a month ago (I’ll add a note to the top about that). If anyone inspired it, it would be Ethan Perez.
  - jacquesthibs 27 Nov 2025 21:14 UTC
    10 points
    6
    Parent
    Ok, good to know! The title just made it seem like it was inspired by his recent post.
    Great to hear you’ll respond; did not expect that, so mostly meant it for the readers who agree with your post.
  - Adrià Garriga-alonso 28 Nov 2025 22:48 UTC
    LW: 3 AF: 2
    0
    AF Parent
    I’m honestly very curious what Ethan is up to now, both you and Thomas Kwa implied that he’s not doing alignment anymore. I’ll have to reach out...
- Adrià Garriga-alonso 28 Nov 2025 6:57 UTC
  LW: 4 AF: 2
  2
  AF Parent
  Thank you for your defense Jacques, it warms my heart :)
  
  However, I think Lesswrong has been extremely kind to me and continue to be impressed by this site’s discourse norms. If I were to post such a critique in any other online forum, it would have heavily negative karma. Yet, I continue to be upvoted, and the critiques are good-faith! I’m extremely pleased.
  - jacquesthibs 28 Nov 2025 15:31 UTC
    8 points
    0
    Parent
    I agree it’s true that other forums would engage with even worse norms, but I’m personally happy to keep the bar high and have a high standard for these discussions, regardless of what others do elsewhere. My hope is that we never stop striving for better, especially since, for alignment, the stakes are incredibly higher than most other domains, so we need a higher standard of frankness.
- habryka 27 Nov 2025 22:59 UTC
  LW: 2 AF: 3
  8
  AF Parent
  This comment had a lot of people downvote it (at this time, 2 overall karma with 19 votes). It shouldn’t have been, and I personally believe this is a sign of people being attached to AI x-risk ideas and of those ideas contributing to their entire persona rather than strict disagreement. This is something I bring to conversations about AI risk, since I believe folks will post-rationalize. The above comment is not low effort or low value.
  I generally think it makes sense for people to have pretty complicated reasons for why they think something should be downvoted. I think this goes more for longer content, which often would require an enormous amount of effort to respond to explicitly.
  I have some sympathy for being sad here if a comment ends up highly net-downvoted, but FWIW, I think 2 karma feels vaguely in the right vicinity for this comment, maybe I would upvote it to +6, but I would indeed be sad to see it at +20 or whatever since I do think it’s doing something pretty tiring and hard to engage with. Directional downvoting is a totally fine use of downvoting, and if you think a comment is overrated but not bad, please downvote it until its karma reflects where you want it to end up!
  (This doesn’t mean it doesn’t make sense to do sociological analysis of cultural trends on LW using downvoting, but I do want to maintain the cultural locus where people can have complicated reasons for downvoting and where statements like “if you disagree strongly with the above comment you should force yourself to outline your views” aren’t frequently made. The whole point of the vote system is to get signal from people without forcing them to do huge amounts of explanatory labor. Please don’t break that part)
  - Adrià Garriga-alonso 28 Nov 2025 6:59 UTC
    LW: 11 AF: 5
    0
    AF Parent
    
    I do think it’s doing something pretty tiring and hard to engage with
    
    That’s fair, it is tiring. I did want to make sure to respond to every particular point I disagreed with to be thorough, but it is just sooo looong.
    
    What would you have me do instead? My best guess, which I made just after writing the comment, is that I should have proposed a list of double-crux candidates instead.
    
    Do you have any other proposals or that’s good
    - [ ]
      [deleted]
  - jacquesthibs 27 Nov 2025 23:20 UTC
    2 points
    0
    Parent
    I generally think it makes sense for people to have pretty complicated reasons for why they think something should be downvoted. I think this goes more for longer content, which often would require an enormous amount of effort to respond to explicitly.
    Generally agree with this. I think in this case, I’m trying to call out safety folks to be frank with themselves and avoid the mistake of not trying to figure out if they really believe alignment is still hard or are looking for reasons to believe it is. Might not be what is happening here, but I did want to encourage critical thinking and potentially articulating it in case it is for some.
    (Also, I did not mean for people to upvote it to the moon. I find that questionable too.)