Wei Dai comments on The problem of graceful deference

Wei Dai 12 Nov 2025 7:49 UTC
50 points
17
Yudkowsky, being the best strategic thinker on the topic of existential risk from AGI
This seems strange to say, given that he:
1. decided to aim for “technological victory”, without acknowledging or being sufficiently concerned that it would inspire others to do the same
2. decided it’s feasible to win the AI race with a small team and while burdened by Friendliness/alignment/x-safety concerns
3. overestimated likely pace of progress relative to difficulty of problems, even on narrow problems that he personally focused on like decision theory (still far from solved today, ~16 years later. Edit: see UDT shows that decision theory is more puzzling than ever)
4. had large responsibility for others being overly deferential to him by writing/talking in a highly confident style, and not explicitly pushing back on the over-deference
5. is still overly focused on one particular AI x-risk (takeover due to misalignment) and underemphasizing or ignoring many other disjunctive risks
These seemed like obvious mistakes even at the time (I wrote posts/comments arguing against them), so I feel like the over-deference to Eliezer is a completely different phenomenon from “But you can’t become a simultaneous expert on most of the questions that you care about.” or has very different causes. In other words, if you were going to spend your career on AI x-safety, of course you could have become an expert on these questions first.
What links here?
- TsviBT 12 Nov 2025 8:31 UTC
  10 points
  6
  Parent
  These seemed like obvious mistakes even at the time (I wrote posts/comments arguing against them)
  
  Ok. If that’s true then yeah, you might a very good strategic thinker about AGI X-risk. Yudkowsky still probably wins, given the evidence I currently have. He’s been going really hard at it for >20 years. You can criticize the writing style of LW, and say how in general he could have been deferred-to more gracefully, and I’m very open to that and somewhat interested in that.
  
  But it seems strange to be counting down from “Yudkowsky-LW-sphere, but even better” rather than up from “no Yudkowsky-LW-sphere”. (Which isn’t to say “well his stuff is really popular so he’s a good strategic thinker”, but rather “actually the Sequences and CFAI and https://intelligence.org/files/AIPosNegFactor.pdf and https://intelligence.org/files/ComplexValues.pdf and https://intelligence.org/files/CognitiveBiases.pdf and https://files.givewell.org/files/labs/AI/IEM.pdf were a huge amount of strategic background; as a consequence of being good strategic background, they shifted many people to working on this”.)
  
  Maybe I’m misunderstanding what you’re saying though / not addressing it. If someone had been building out the conceptual foundations of AGI X-derisking via social victory for >20 years, they’d probably have a strong claim to being the best strategic thinker on AGI X-risk in my book.
  
  so I feel like the over-deference to Eliezer is a completely different phenomenon from “But you can’t become a simultaneous expert on most of the questions that you care about.” or has very different causes.
  
  I’m not saying it is! You may have misread. (Or maybe I misspoke—if so, sorry, I’m not rereading my post but I can if you think I did say this.) I’m saying that SOME deference is probably unavoidable, BUT there’s a lot of ACTUAL deference (such as the examples I cited involving Yudkowsky!) that is BAD, so we should try to NOT DO THE BAD ONES but in a way that doesn’t NECESSARILY involve “just don’t defer at all”.
  
  In other words, if you were going to spend your career on AI x-safety, of course you could have become an expert on these questions first.
  
  No? They’re all really difficult questions. Even being an expert in one of these would be at least a career. I mean, maybe YOU can, but I can’t, and I definitely can’t do so when I’m just a kid starting to think about how to help with X-derisking.
  
  I mean I’m obviously not arguing “don’t seriously investigate the crucial questions in your field for yourself”, or even “don’t completely unwind all your deference about strategy, all the way to the top, using your full power of critique, and start figuring things out actually from scratch”. I’ve explicitly told dozens of relative newcomers (2016--2019, roughly) to AGI X-derisking that they should stop trying so hard to defer, that there are several key dangers of deference, that they should try to become experts in key questions even if that would take a lot of effort, that the only way to be a really serious X-derisker is to start your work on planting questions about key elements, etc. My point is that
  1. {people, groups, funders, organizations, fields} do in fact end up deferring, and
  2. probably quite a lot of this is unavoidable, or at least unavoidable for now / given what we know about how to do group rationality,
  3. but also deference has a ton of bad effects, so
  4. we should figure out how to have less of those bad effects—and not just via “defer less”.
  - Wei Dai 12 Nov 2025 8:51 UTC
    16 points
    8
    Parent
    
    a huge amount of strategic background; as a consequence of being good strategic background, they shifted many people to working on this”
    
    Maybe we should distinguish between being good at thinking about / explaining strategic background, versus being actually good at strategy per se, e.g. picking high-level directions or judging overall approaches? I think he’s good at the former, but people mistakenly deferred to him too much on the latter.
    
    It would make sense that one could be good at one of these and less good at the other, as they require somewhat different skills. In particular I think the former does not require one to be able to think of all of the crucial considerations, or have overall good judgment after taking them all into consideration.
    
    No? They’re all really difficult questions. Even being an expert in one of these would be at least a career. I mean, maybe YOU can, but I can’t, and I definitely can’t do so when I’m just a kid starting to think about how to help with X-derisking.
    
    So Eliezer could become experts in all of them starting from scratch, but you couldn’t even though you could build upon his writings and other people’s? What was/is your theory of why he is so much above you in this regard? (“Being a kid” seems a red herring since Eliezer was pretty young when he did much of his strategic thinking.)
    - TsviBT 12 Nov 2025 9:09 UTC
      7 points
      5
      Parent
      
      I think he’s good at the former, but people mistakenly deferred to him too much on the latter.
      
      I agree and I said as much, but this also seems like a non sequitur if you’re just trying to say he’s not the best strategic thinker. Someone can be the best and also be “overrated” (or rather, overly deferred-to). I’m saying he is both. The “thinking about / explaining strategic background” is strong evidence of actually being good at strategy. Separately, Yudkowsky is the biggest creator of our chances of social victory, via LW/X-derisking sphere! (I’m not super confident of that, but pretty confident? Any other candidates?) So it’s a bit hard to argue that he didn’t pick that strategic route as well as the technical route! You can’t grade Yudkowsky on his own special curve just for all his various attempts at X-derisking, and then separately grade everyone else.
      
      It would make sense that one could be good at one of these and less good at the other, as they require somewhat different skills. In particular I think the former does not require one to be able to think of all of the crucial considerations, or have overall good judgment after taking them all into consideration.
      
      Ok. I mean, true. I guess someone could suggest alternative candidates, though I’m noticing IDK why to care much about this question.
      
      (I continue to have a sense that you’re misunderstanding what I’m saying, as described earlier, and also not sure what’s interesting about this topic. My bid would be, if there’s something here that seems interesting or important to you, that you would say a bit about what that is and why, as a way of recentering. It seems like you’re trying to drill down into particulars, but you keep being like “So why do you think X?” and I’m like “I don’t think X.”.)
      - Wei Dai 12 Nov 2025 9:51 UTC
        8 points
        3
        Parent
        By saying that he was the best strategic thinker, it seems like you’re trying to justify deferring to him on strategy (why not do that if he is actually the best), while also trying to figure out how to defer “gracefully”, whereas I’m questioning whether it made sense to defer to him at all, when you could have taken into account his (and other people’s) writings about strategic background, and then looked for other important considerations and formed your own judgments.
        
        Another thing that interests me is that several of his high-level strategic judgments seemed wrong or questionable to me at the time (as listed in my OP, and I can look up my old posts/comments if that would help), and if it didn’t seem that way to others, I want to understand why. Was Eliezer actually right, given what we knew at the time? Did it require a rare strategic mind to notice his mistakes? Or was it a halo effect, or the effect of Eliezer writing too confidently, or something else, that caused others to have a cognitive blind spot about this?
        TsviBT 12 Nov 2025 11:13 UTC
        4 points
        8
        Parent
        
        By saying that he was the best strategic thinker, it seems like you’re trying to justify deferring to him on strategy (why not do that if he is actually the best)
        
        No. You’re totally hallucinating this and also not updating when I’m repeatedly telling you no. It’s also the opposite of the point hammered in by the OP. My entire post is complaining about problems with deferring, and it links a prior post I wrote laying out these problems in detail, and I linked that essay to you again, and I linked several other writings explaining more how I’m against deferring and tell people not to defer repeatedly and in different ways. I bring up Eliezer to say “Look, we deferred to the best strategic thinker, and even though he’s the best strategic thinker, deferring was STILL really bad.”. Since I’ve described how deferring is really bad in several other places, here in THIS post I’m asking, given that we’re going to defer despite its costs, and given that to some extent at the end of the day we do have to defer on many things, what can we do to alleviate some of those problems?
        
        And then you’re like “Ha. Why not just not defer?”.
        What links here?
        Zack_M_Davis's comment on The Charge of the Hobby Horse by TsviBT (14 Nov 2025 19:44 UTC; 16 points)
        TsviBT's comment on The Charge of the Hobby Horse by TsviBT (16 Nov 2025 7:21 UTC; 15 points)
        Wei Dai 12 Nov 2025 12:25 UTC
        13 points
        3
        Parent
        
        Since I’ve described how deferring is really bad in several other places, here in THIS post I’m asking, given that we’re going to defer despite its costs, and given that to some extent at the end of the day we do have to defer on many things, what can we do to alleviate some of those problems?
        
        Ok, it looks like part of my motivation for going down this line of thought was based on a misunderstanding. But to be fair, in this post after you asked “What should we have done instead?” with regard to deferring to Eliezer, you didn’t clearly say “we should have not deferred or deferred less”, but instead wrote “We don’t have to stop deferring, to avoid this correlated failure. We just have to say that we’re deferring.” Given that this is a case where many people could have and should have not deferred, this just seems like a bad example to illustrate “given that to some extent at the end of the day we do have to defer on many things, what can we do to alleviate some of those problems?”, leading to the kind of confusion I had.
        
        Also, another part of my motivation is still valid and I think it would be interesting to try to answer why didn’t you (and others) just not defer? Not in a rhetorical sense, but what actually caused this? Was it age as you hinted earlier? Was it just human nature to want to defer to someone? Was it that you were being paid by an organization that Eliezer founded and had very strong influence over? Etc.? And also why didn’t you (and others) notice Eliezer’s strategic mistakes, if that has a different or additional answer?
        What links here?
        Wei Dai's comment on The Charge of the Hobby Horse by TsviBT (14 Nov 2025 10:53 UTC; 16 points)
        TsviBT 12 Nov 2025 15:58 UTC
        5 points
        −9
        Parent
        
        Also, another part of my motivation is still valid and I think it would be interesting to try to answer why didn’t you (and others) just not defer? Not in a rhetorical sense, but what actually caused this?
        
        Ok, sure, that’s a good question, and also off-topic.
        
        Was it age as you hinted earlier?
        
        Yeah obviously. It’s literally impossible to not defer, all you get to pick is which things you invest in undeferring in what order. I’m exceptionally non-deferential but yeah obviously you have to defer about lots of things.
        
        Was it just human nature to want to defer to someone?
        
        Yes it is also human nature to want to defer. E.g. that’s how you stay synched with your tribe on what stuff matters, how to act, etc.
        
        Was it that you were being paid by an organization that Eliezer founded and had very strong influence over? Etc.?
        
        No, I took being paid as more obligation to not defer.
        
        Anyway, I’m banning you from my posts due to grossly negligent reading comprehension. [Edit: I no longer thing this is a fair characterization.]
        Nina Panickssery 14 Nov 2025 18:59 UTC
        15 points
        8
        Parent
        I was curious so I read this comment thread, and am genuinely confused why Tsvi is so annoyed by the interaction (maybe I am being dumb and missing something). My interpretation of Wei Dai’s point is the following:
        Tsvi is saying something like:
        People have a tendency to defer too much (though deferring sometimes is necessary). They should consider deferring less and thinking for themselves more.
        When one does defer, it’s good to be explicit about that fact, both to oneself and others.
        As an example to illustrate his point, Tsvi mentions a case where he deferred to Yudkowsky. This is used as an example because Yudkowsky is considered a particularly good thinker on the topic Tsvi (and many others) deferred on, but nevertheless there was too much deference.
        Wei Dai points out that he thinks the example is misleading, because to him it looks more like being wrong about who it’s worth deferring to, rather than deferring too much. The more general version of his point is “You, Tsvi, are noticing problems that occur from people deferring. However, I think these problems may be at least partially due to them deferring to the wrong people, rather than deferring at all.”
        (If this is indeed the point Wei Dai is making, I happen to think Tsvi is more correct, but I don’t think WD’s contribution is meaningless or in bad faith.)
        TsviBT 14 Nov 2025 20:13 UTC
        2 points
        0
        Parent
        
        because to him it looks more like being wrong about who it’s worth deferring to,
        
        Except in his first comment he said:
        
        In other words, if you were going to spend your career on AI x-safety, of course you could have become an expert on these questions first.
        
        Which seems to say exactly “defer less” not “defer to a different person”.
        
        Anyway, like I’ve said, what’s annoying is not his thesis, but the fact that he fabricated a disagreement by imagining a position I held (which I didn’t) and then not updating when I clarified (which I did), seemingly in order to talk about his thing that he cares about rather than the topic of the post.
        Nina Panickssery 15 Nov 2025 6:43 UTC
        2 points
        0
        Parent
        Yeah that’s fair. I didn’t follow the “In other words” sentence (it doesn’t seem to be restating the rest of the comment in other words, but rather making a whole new (flawed) point).
        Zack_M_Davis 12 Nov 2025 16:39 UTC
        12 points
        5
        Parent
        The grandparent explains why Dai was confused about your authorial intent, and his comment at the top of the thread is sitting at 31 karma in 15 votes, suggesting that other readers found Dai’s engagement valuable. If that’s grossly negligent reading comprehension, then would you prefer to just not have readers? That is, it seems strange to be counting down from “smart commenters interpret my words in the way I want them to be interpreted” rather than up from “no one reads or comments on my work.”
        Wei Dai 12 Nov 2025 18:04 UTC
        9 points
        2
        Parent
        
        suggesting that other readers found Dai’s engagement valuable
        
        This may not be a valid inference, or your update may be too strong, given that my comment got a strong upvote early or immediately, which caused it to land in the Popular Comments section of the front page, where others may have further upvoted it in a decontextualized way.
        
        It looks like I’m not actually banned yet, but will disengage for now to respect Tsvi’s wishes/feelings. Thought I should correct the record on the above first, as I’m probably the only person who could (due to seeing the strong upvote and the resulting position in Popular Comments).
        What links here?
        Wei Dai's comment on The Charge of the Hobby Horse by TsviBT (16 Nov 2025 3:14 UTC; 4 points)
        TsviBT 12 Nov 2025 20:40 UTC
        2 points
        −4
        Parent
        I have banned you from my posts, but my guess is that you’re still allowed to post on existing comment threads with you involved, or something like. I’m happy for you to comment on anything that the LW interface allows you to comment on. [ETA: actually I hadn’t hit “submit” on the ban; I’ve done that now, so Wei Dai might no longer be able to reply on this post at all.]
        
        Possibly I’ll unban you some time in the future (not that anyone cares too much, I presume). But like, this comment thread is kinda wild from my perspective. My current understanding is that you “went down some line of questioning” based on a misunderstanding, but did not state what your line of questioning was and also ignored anything in my responses that wasn’t furthering your “line of questioning” including stuff that was correcting your misunderstanding. Which is pretty anti-helpful.
        TsviBT 12 Nov 2025 20:30 UTC
        3 points
        0
        Parent
        Did you read the whole comment thread?
        TsviBT 13 Nov 2025 17:56 UTC
        −3 points
        −3
        Parent
        A bit blackpilling re/ LW voters. So cowardly, and so wrong.
    - TsviBT 12 Nov 2025 9:15 UTC
      −2 points
      −4
      Parent
      Are you wanting to say “I, Wei Dai, am a better strategic thinker on AGI X-derisking than Yudkowsky.”? That’s a perfectly fine thing to say IMO, but of course you should understand that most people (me included) wouldn’t by default have the context to believe that.
  - Adele Lopez 12 Nov 2025 10:12 UTC
    9 points
    0
    Parent
    “no Yudkowsky-LW-sphere”
    It’s not obvious to me that we’re better off than this world, sadly. It seems like one of the main effects was to draw lots of young blood into the field of AI.
    - TsviBT 12 Nov 2025 11:20 UTC
      16 points
      13
      Parent
      That’s plausible, IDK. But are you saying that PROSPECTIVELY the PREDICTABLE-ish effects were bad? Who said “Sure you could tie together a whole bunch of existing epistemological threads, and do a bunch of new thinking, and explain AI danger very clearly and thoroughly, and yeah, you could attract a huge amount of brainpower to try to think clearly about how to derisk that, but then they’ll just all start trying to make AGI. And here’s the reasons I can actually know this.”? There might have been people starting to say this by 2015 or 2018, IDK. But in 2010? 2006?
      - Adele Lopez 12 Nov 2025 19:54 UTC
        10 points
        6
        Parent
        I think it’s not an impossible call. The fiasco with Roko’s Basilisk (2010) seems like a warning that could have been heeded. It turns out that “freaking out” about something being dangerous and scary makes it salient and exciting, which in turn causes people to fixate on it in ways that are obviously counterproductive. That it becomes a mark of pride to do the dangerous thing without being scathed (as with the Demon core). Even though you warned them about this from the beginning, and in very clear terms.
        
        And even if there was no one able to see this (it’s not like I saw it), it remains a strategic error — reality doesn’t grade on a curve.
        What links here?
        Adele Lopez's comment on AI safety undervalues founders by Ryan Kidd (16 Nov 2025 9:22 UTC; 4 points)
        TsviBT 12 Nov 2025 20:49 UTC
        4 points
        2
        Parent
        Yes, it would be a strategic error in a sense, but it wouldn’t be a strong argument against “Yudkowsky is the best strategic thinker on AGI X-derisking”, which I was given to understand was the topic of this thread. For that specific question, which seemed to be the topic of Wei Dai’s comment, it is graded on a curve. (I don’t actually feel that interested in that question though.)
      - M. Y. Zuo 12 Nov 2025 18:23 UTC
        −13 points
        0
        Parent
        The question doesn’t make sense. It’s not possible to judge conclusively whether something is good or bad ahead of time… only after the fact.
        Because real world actions and outcomes are what counts, not what is claimed verbally or in writing.
  - Jesper L. 13 Nov 2025 16:13 UTC
    8 points
    0
    Parent
    Being a good strategist is about things like
    A) Understanding and probing the opposition/problem well
    B) Coordinating your resources
    C) Understanding rules and principles governing the nature of the game (operational constraints)
    D) Creative problem solving + tactics
    E) Knowing strategic principles (e.g., seizing initiative, pre-empting the opposition, leveraging commitment vulnerabilities, etc.)
    F) Managing asymmetric information (my specialty)
    G) Avoiding risky overcommitment
    Etc. I am not sure Eliezer has showcased such skills in his work. He is a brilliant independent researcher and thinker, but not a top tier strategist or leader, as far as I can tell.
    - TsviBT 13 Nov 2025 16:20 UTC
      4 points
      2
      Parent
      Is there someone you’d point to as being a better “strategic thinker on the topic of existential risk from AGI”, as is the topic of discussion in this thread?
      - Jesper L. 13 Nov 2025 16:39 UTC
        −1 points
        −4
        Parent
        Good question. ARE there any A-tier strategists at all on x-risk? I’d nominate Stuart Russel. Hm. Even Yoshua Bengio is arguably also having a larger impact than Eliezer in some critical areas (policy).
        For pure strategic competence, Amandeep Singh Gill.
        Jaan Tallin. Maybe even Xue Lan.
        TsviBT 13 Nov 2025 16:46 UTC
        6 points
        4
        Parent
        Russell, Bengio, and Tallinn are good but not in the same league as Yudkowsky in terms of strategic thinking about AGI X-derisking. A quick search of Gill doesn’t turn up anything about existential risk but I could very easily have missed it.
        Jesper L. 13 Nov 2025 16:59 UTC
        3 points
        4
        Parent
        Okay, I think I see the confusion. Your phrasing make it seem (to me at least) like Eliezer has had the biggest strategic impact on mitigating x-risk, and arguably also being the most competent there. I would really not be sure of that. But if we talk about strategically dissecting x-risk, without necessarily mitigating it, directly or indirectly, then maybe Eliezer would win. Still would maybe lean towards Stuart.
        Gill IS having an impact that de facto mitigates x-risk, whether he uses the term or not. But he is not making people talk about it (without necessarily doing anything about it) as much as Eliezer. In that sense one could argue he isn’t really an x-risk champ.
        habryka 13 Nov 2025 23:56 UTC
        4 points
        5
        Parent
        Amandeep Singh Gill
        From Wikipedia:
        Gill believes AI can help accelerate the process of achieving the UN’s Sustainable Development Goals.^[17]
        What? I have never heard of this person, and the little I have read suggests he is deeply deeply confused about the nature of AGI. This doesn’t feel like a serious suggestion.
        Jesper L. 14 Nov 2025 8:00 UTC
        −1 points
        0
        Parent
        Why not? What does the quote have to do with anything?
        habryka 14 Nov 2025 8:22 UTC
        3 points
        2
        Parent
        If one of your central takeaways from AI is that it is “going to help accelerate the process of achieveing the UN’s Sustainable Development Goals” then you are deeply miscalibrated about the impact of AI.
        It’s like saying that “the industrial revolution could help improve the efficiency of chariot production”. Bro, there is going to be no chariots after the industrial revolution. There are also going to be no more sustainable development goals post-ASI.
        Like, it’s a random quote, maybe he had more context that makes it make more sense, but it’s the only object level take of his I could find on his Wikipedia page. If he has more relevant things to say, they didn’t make it into the things I could quickly find out about him, but my first skim of things strongly suggested someone who lacks situational awareness (in the https://situational-awareness.ai/ sense).
        Jesper L. 14 Nov 2025 8:42 UTC
        −4 points
        0
        Parent
        Habryka, I genuinely don’t know why that quote appeared for you first.
        https://en.wikipedia.org/wiki/Amandeep_Singh_Gill
        First sentence under work: “Gill has written about the impact of artificial intelligence (AI) on modern life and the necessity for establishing appropriate regulatory frameworks to ensure AI plays a positive role in the future.”
        He is the UN envoy. He is in policy, politics, regulation.
        Bio at AI for good
        This is what “gill achievements AI” bring up for me (don’t need full name)
        “Gill helped secure high-impact international consensus recommendations on regulating Artificial Intelligence (Al) in lethal autonomous weapon systems in 2017 and 2018, the draft Al ethics recommendation of UNESCO in 2020, and a new international platform on digital health and Al.”
        This is the AI advisory board: https://www.un.org/en/ai-advisory-body/members
        Edit: Check out the credentials of its members. I see a lot of competence there. Compare qith national committees. Steering this is a strategic achievement.
        He is a political coordinator. I hope that you can understand that he has to discuss existing AI, not just future AI.
        Think what kind of statements give you political leverage in his position. I could also ask how many policies Eliezer has successfully pushed through banning AI research or deployment, to make this point more clear.
        In general, I stand by Stuart as the overall champ. Gill is last on alignment knowledge (still knowledgable on AI), high on strategy.
        Back to topic:
        All I am pointing out, is that you don’t need to throw in the word strategic anywhere when mentioning that Eliezer is an excellent x-risk analyst and advocate. I think this is important distinction, because we also need strategic AI safety champions and political regulation.
        Note: Even people who don’t even believe in x-risk can have a huge impact, if they successfully regulate AI in key areas/regions, or internationally.
        habryka 14 Nov 2025 9:09 UTC
        2 points
        0
        Parent
        I am not really sure what all of the things you are saying here are supposed to tell me. Maybe I am supposed to respect random people in the UN? I do not generally think highly of the UN, or think involvement in it is much of a sign of being a good strategist (though of course, as all highly selected positions it is of course evidence of being in the top percentiles of competence, but not more than that).
        I didn’t quote these sections because they too are largely uninformative:
        Gill helped secure high-impact international consensus recommendations on regulating Artificial Intelligence (Al) in lethal autonomous weapon systems in 2017 and 2018, the draft Al ethics recommendation of UNESCO in 2020, and a new international platform on digital health and Al.
        Like, what is this supposed to tell me? I really don’t know the sign of lethal autonomous weapon regulation. My guess is it’s mildly bad and I was historically opposed to regulating it, but it’s not super clear and I’ve flipped back and forth a few times. The “platform for digital health and AI” seems like a red flag, but I don’t know.