ryan_greenblatt comments on An epistemic advantage of working as a moderate

ryan_greenblatt 29 Aug 2025 15:46 UTC
23 points
12
Note that this post is arguing that there are some specific epistemic advantages of working as a moderate, not that moderates are always correct or that there aren’t epistemic disadvantages to being a moderate. I don’t think “there exist moderates which seem very incorrect to me” is a valid response to the post similarly to how “there exist radicals which seem very incorrect to me” wouldn’t be a valid argument for the post.

This is independent from the point Buck notes that the label moderate as defined in the post doesn’t apply in 2020.
- johnswentworth 30 Aug 2025 20:45 UTC
  57 points
  30
  Parent
  As a response to the literal comment at top-of-thread, this is clearly reasonable. But I think Eliezer is correctly invoking some important subtext here, which your comment doesn’t properly answer. (I think this because I often make a similar move to the one Eliezer is making, and have only understood within the past couple years what’s load-bearing about it.)
  Specifically, there’s an important difference between:
  - “<person> was wrong about <argument/prediction/etc>, so we should update downward on deferring to their arguments/predictions/etc”, vs
  - “<person> was wrong about <argument/prediction/etc>, in a way which seems blindingly obvious when we actually think about it, and so is strong evidence that <person> has some systematic problem in the methods they’re using to think (as opposed to just being unlucky with this one argument/prediction/etc)”
  Eliezer isn’t saying the first one, he’s saying the second one, and then following it up with a specific model of what is wrong with the thinking-methods in question. He’s invoking bio anchors and that Carlsmith report as examples of systematically terrible thinking, i.e. thinking which is in some sense “obviously” wrong when one is not trying-on-some-level to avoid real-thinking about it, and he’s specifically pointing to desire-to-appear-moderate as the likely primary factor which drove that systematically terrible thinking. He’s not invoking them merely as examples of people being wrong.
  - Eli Tyre 31 Aug 2025 3:24 UTC
    16 points
    −4
    Parent
    “<person> was wrong about <argument/prediction/etc>, in a way which seems blindingly obvious when we actually think about it, and so is strong evidence that <person> has some systematic problem in the methods they’re using to think (as opposed to just being unlucky with this one argument/prediction/etc)”
    I think “this was blindingly obvious when we actually think about it”, is not socially admissible evidence, because of hindsight bias.
    
    I thought about a lot of this stuff before 2020. For the most part, I didn’t reach definitive conclusions about a lot of it. In retrospect, a lot of the the conclusions that I did provisionally accept, I, in retrospect, think I was overconfident about, given the epistemic warrant.
    
    Was I doing “actual thing”? No, probably not, or at least not by many relevant standards. Could I have done better? Surely, but not by recourse to magical “just think better” cognition.
    
    The fact remains that It Was Not Obvious To Me.
    
    Others may claim that it was obvious to them, and they might be right—maybe it was obvious to them.
    
    If a person declared operationalized-enough-to-be-gradable prediction before the event was settled, well, then I can update that their worldview made correct predictions.
    
    But if they additionally say that it was obvious and we all should have been able to tell, well, that they say that doesn’t add any additional evidential weight? A person saying “it was blindingly obvious” doesn’t particularly distinguish the world where it actually was obvious, and I would have been able to tell if I had done Actual Thinking, and the world where it was a confusing question about the future that was hard to call in advance, and they happened to get this one right.
    
    If you can show me how it’s obvious, such that it does in fact become obvious to me, that’s a different story.^[1] But even then, the time to show that something is obvious is before reality reveals the answer.
    
    It’s fine, I guess, for Eliezer and John to assert that some things were actually obvious, and we all should have been able to tell.
    
    But I (and almost everyone else who didn’t call it as obvious in advance), should pay attention to the correct prediction, and ignore the assertion that it was obvious.
    ^
    And, to be clear, both Eliezer and John have put in enormous levels of effort into trying to do that kind of communication. I can’t fault you for not attempting to show what you think you know.
    - johnswentworth 31 Aug 2025 5:17 UTC
      28 points
      16
      Parent
      Feels like there’s some kind of frame-error here, like you’re complaining that the move in question isn’t using a particular interface, but the move isn’t intended to use that interface in the first place? Can’t quite put my finger on it, but I’ll try to gesture in the right direction.
      Consider ye olde philosophers who liked to throw around syllogisms. You and I can look at many of those syllogisms and be like “that’s cute and clever and does not bind to reality at all, that’s not how real-thinking works”. But if we’d been around at the time, very plausibly we would not be able to recognize the failure; maybe we would not have been able to predict in advance that many of the philosophers’ clever syllogisms totally fail to bind to reality.
      Nonetheless, it is still useful and instructive to look at those syllogisms and say “look, these things obviously-in-some-sense do not bind to reality, they are not real-thinking, and therefore they are strong evidence that there is something systematically wrong with the thinking-methods of those philosophers”. (Eliezer would probably reflexively follow that up with “so I should figure out what systematic thinking errors plagued those seemingly-bright philosophers, and caused them to deceive themselves with syllogisms, in order to avoid those errors myself”.)
      And if there’s some modern-day philosopher standing nearby saying that in fact syllogisms totally do bind to reality… then yeah, this whole move isn’t really a response to them. That’s not really what it’s intended for. But even if one’s goal is to respond to that philosopher, it’s probably still a useful first step to figure out what systematic thinking error causes them to not notice that many of their syllogisms totally fail to bind to reality.
      So I guess maybe… Eliezer’s imagined audience here is someone who has already noticed that bio anchors and the Carlsmith thing fail to bind to reality, but you’re criticizing it for not instead responding to a hypothetical audience who thinks that the reports maybe do bind to reality?
      - Eli Tyre 31 Aug 2025 6:53 UTC
        16 points
        6
        Parent
        So I guess maybe… Eliezer’s imagined audience here is someone who has already noticed that bio anchors and the Carlsmith thing fail to bind to reality, but you’re criticizing it for not instead responding to a hypothetical audience who thinks that the reports maybe do bind to reality?
        
        I almost added a sentence at the end of my comment to the effect of…
        
        “Either someone did that X was blindly obvious, in which case they don’t need to be told, or it wasn’t blindingly obvious to them, and they should should pay attention to the correct prediction, and ignore the assertion that it was obvious. In either case...the statement isn’t doing anything?”
        
        Who are statements like these for? Is it for the people who thought that things were obvious to find and identify each other?
        
        To gesture at a concern I have (which I think is probably orthogonal to what you’re pointing at):
        
        On a first pass, the only people who might be influenced by statements like that are being influenced epistemically illegitimately.
        
        Like, I’m imagining a person, Bob, who heard all the arguments at the time and did not feel confident enough to make a specific prediction. But then we all get to wait a few years and see how (some of the questions, though not most of them) actually played out, and then Eliezer or whoever says “not only was I right, it was blindingly obvious that I was right, and we all should have known all along!”
        
        This is in practice received by Bob as almost an invitation to rewrite history and hindsight bias about what happened. It’s very natural to agree with Eliezer (or whoever) that, “yeah, it was obvious all along.” ^[1]
        
        And that’s really sus! Bob didn’t get new information or think of new considerations that caused the confusing question to go from confusing to obvious. He just learned the answer!
        He should be reminding himself that he didn’t in fact make an advance prediction, and remembering that at the time, it seemed like a confusing hard-to-call question, and analyzing what kinds of general thinking patterns would have allowed him to correctly call this one in advance.
        
        I think when Eliezer gets irate and at people for what he considers their cognitive distortions:
        It doesn’t convince the people he’s ostensibly arguing against, because those people don’t share his premises. They often disagree with him, on the object level, about whether the specific conclusions under discussion have been falsified.
        (eg Ryan saying he doesn’t think bio ancors was unreasonable, in this thread, or Paul disagreeing with Eliezer claims ~”that people like Paul are surprised by how the world actually plays out.”)
        It doesn’t convince the tiny number of people who could see for themselves that those ways of thinking were blindingly obvious (and/or have a shared error pattern with Eliezer, that cause them to be making the same mistake).
        (eg John Wentworth)
        It does sweep up some social-ideologically doomer-y people into feeling more confidence for their doomerism and related beliefs, both by social proof (Eliezer is so confident and assertive, which makes me feel more comfortable asserting high P(doom)s), and because Eliezer’s setting a frame in which he’s right, and people doing Real Thinking(TM), can see that he’s right, and anyone who doesn’t get it is blinded by frustrating biases.
        (eg “Bob”, though I’m thinking of a few specific people.)
        It alienates a bunch of onlookers, both people who think that Eliezer is wrong / making a mistake, and people who are agnostic.
        In all cases, this seems either unproductive or counterproductive.
        ^
        Like, there’s some extra psychological omph of just how right Eliezer (or whoever) was and how wrong the other parties were. You get to be on the side of the people who were right all along, against the oppressive forces of OpenPhil’s powerful distortionary forces / the power of modest epistemology / whatever. There’s some story that the irateness invites onlookers like Bob to participate in.
        johnswentworth 31 Aug 2025 18:47 UTC
        22 points
        3
        Parent
        Ok, I think one of the biggest disconnects here is that Eliezer is currently talking in hindsight about what we should learn from past events, and this is and should often be different from what most people could have learned at the time. Again, consider the syllogism example: just because you or I might have been fooled by it at the time does not mean we can’t learn from the obvious-in-some-sense foolishness after the fact. The relevant kind of “obviousness” needs to include obviousness in hindsight for the move Eliezer is making to work, not necessarily obviousness in advance, though it does also need to “obvious” in advance in a different sense (more on that below).
        Short handle: “It seems obvious in hindsight that <X> was foolish (not merely a sensible-but-incorrect prediction from insufficient data); why wasn’t that obvious at the time, and what pattern do I need to be on the watch for to make it obvious in the future?”
        Eliezer’s application of that pattern to the case at hand goes:
        It seems obvious-in-some-sense in hindsight that bio anchors and the Carlsmith thing were foolish, i.e. one can read them and go “man this does seem kind of silly”.
        Insofar as that wasn’t obvious at the time, it’s largely because people were selecting for moderate-sounding conclusions. (That’s not the only generalizable pattern which played a role here, but it’s an important one.)
        So in the future, I should be on the lookout for the pattern of selecting for moderate-sounding conclusions.
        I think an important gear here is that things can be obvious-in-hindsight, but not in advance, in a way which isn’t really a Bayesian update on new evidence and therefore doesn’t strictly follow prediction rules.
        Toy example:
        Someone publishes a proof of a mathematical conjecture, which enters canon as a theorem.
        Some years later, another person stumbles on a counterexample.
        Surprised mathematicians go back over the old proof, and indeed find a load-bearing error. Turns out the proof was wrong!
        The key point here is that the error was an error of reasoning, not an error of insufficient evidence or anything like that. The error was “obvious” in some sense in advance; a mathematician who’d squinted at the right part of the proof could have spotted it. Yet in practice, it was discovered by evidence arriving, rather than by someone squinting at the proof.
        Note that this toy example is exactly the sort where the right primary move to make afterwards is to say “the error is obvious in hindsight, and was obvious-in-some-sense beforehand, even if nobody noticed it. Why the failure, and how do we avoid that in the future?”.
        This is very much the thing Eliezer is doing here. He’s (he claims) pointing to a failure of reasoning, not of insufficient evidence. For many people, the arrival of more recent evidence has probably made it more obvious that there was a reasoning failure, and those people are the audience who (hopefully) get value from the move Eliezer made—hopefully they will be able to spot such silly patterns better in the future.
        Thane Ruthenis 5 Sep 2025 12:46 UTC
        6 points
        0
        Parent
        I think an important gear here is that things can be obvious-in-hindsight, but not in advance, in a way which isn’t really a Bayesian update on new evidence and therefore doesn’t strictly follow prediction rules.
        That’s my model here as well. Pseudo-formalizing it: We’re not idealized agents, we’re bounded agents, which means we can’t actually do full Bayesian updates. We have to pick and choose what computations we run, what classes of evidence we look for and update on. In hindsight, we may discover that an incorrect prediction was caused by ours opting not to spend the resources on updating on some specific information, such that if we knew to do that, we would have reliably avoided the error even while having all the same object-level information.
        In other words, it’s a Bayesian update to the distribution over Bayesian updates we should run. We discover a thing about (human) reasoning: that there’s a specific reasoning error/oversight we’re prone to, and that we have to run an update on the output of “am I making this reasoning error?” in specific situations.
        This doesn’t necessarily mean that this meta-level error would have been obvious to anyone in the world at all, at the time it was made. Nowadays, we all may be committing fallacies whose very definitions require agent-foundations theory decades ahead of ours; fallacies whose definitions we wouldn’t even understand without reading a future textbook. But it does mean that specific object-level conclusions we’re reaching today would be obviously incorrect to someone who is reasoning in a more correct way.
    - Richard_Ngo 31 Aug 2025 4:04 UTC
      6 points
      4
      Parent
      If someone predicts in advance that something is obviously false, and then you come to believe that it’s false, then you should update not just towards thought processes which would have predicted that the thing is false, but also towards thought processes which would have predicted that the thing is obviously false. (Conversely, if they predict that it’s obviously false, and it turns out to be true, you should update more strongly against their thought processes than if they’d just predicted it was false.)
      IIRC Eliezer’s objection to bioanchors can be reasonably interpreted as an advance prediction that “it’s obviously false”, though to be confident I’d need to reread his original post (which I can’t be bothered to do right now).
    - Veedrac 31 Aug 2025 19:54 UTC
      2 points
      0
      Parent
      But I (and almost everyone else who didn’t call it as obvious in advance), should pay attention to the correct prediction, and ignore the assertion that it was obvious.
      I think this is wrong. The scenarios where this outcome was easily predicted given the right heuristics and the scenarios where this was surprising to every side of the debate are quite different. Knowing who had predictors that worked in this scenario is useful evidence, especially when the debate was about which frames for thinking about things and selecting heuristics were useful.
      Or, to put this in simpler but somewhat imprecise terms: This was not obvious to you because you were thinking about things the wrong way. You didn’t know which way to think about things at the time because you lacked information about which predicted things better. You now have evidence about which ways work better, and can copy heuristics from people who were less surprised.
- Richard_Ngo 30 Aug 2025 0:03 UTC
  20 points
  11
  Parent
  The argument “there are specific epistemic advantages of working as a moderate” isn’t just a claim about categories that everyone agrees exist, it’s also a way of carving up the world. However, you can carve up the world in very misleading ways depending on how you lump different groups together. For example, if a post distinguished “people without crazy-sounding beliefs” from “people with crazy-sounding beliefs”, the latter category would lump together truth-seeking nonconformists with actual crazy people. There’s no easy way of figuring out which categories should be treated as useful vs useless but the evidence Eliezer cites does seem relevant.
  On a more object level, my main critique of the post is that almost all of the bullet points are even more true of, say, working as a physicist. And so structurally speaking I don’t know how to distinguish this post from one arguing “one advantage of looking for my keys closer to a streetlight is that there’s more light!” I.e. it’s hard to know the extent to which these benefits come specifically from focusing on less important things, and therefore are illusory, versus the extent to which you can decouple these benefits from the costs of being a “moderate”.
  - ryan_greenblatt 30 Aug 2025 1:15 UTC
    2 points
    −10
    Parent
    
    On a more object level, my main critique of the post is that almost all of the bullet points are even more true of, say, working as a physicist.
    
    But (in the language of the post) both moderates and radicals are working in the epistemic domain not some unrelated domain. It’s not that moderates and radicals are trying to answer different questions (and the questions moderates are answering are epistemically easier like physics). There are some differences in the most relevant questions, but I don’t think this is a massive effect.
    - Richard_Ngo 31 Aug 2025 3:59 UTC
      12 points
      10
      Parent
      It’s not that moderates and radicals are trying to answer different questions (and the questions moderates are answering are epistemically easier like physics).
      That seems totally wrong. Moderates are trying to answer questions like “what are some relatively cheap interventions that AI companies could implement to reduce risk assuming a low budget?” and “how can I cause AI companies to marginally increase that budget?” These questions are very different from—and much easier than—the ones the radicals are trying to answer, like “how can we radically change the governance of AI to prevent x-risk?”
      - ryan_greenblatt 31 Aug 2025 14:18 UTC
        6 points
        2
        Parent
        Hmm, I think what I said was about half wrong and I want to retract my point.
        
        That said, I think much of the relevant questions are overlapping (like, “how do we expect the future to generally go?”, “why/how is AI risky?”, “how fast will algorithmic progress go at various points?) and I interpret this post as just talking about the effect on epistemics around the overlapping questions (regardless of whether you’d expect moderates to mostly be working in domains with better feedback loops).
        
        This isn’t that relevant for your main point, but I also think the biggest question for radicals in practice is mostly: How can we generate massive public/government support for radical action on AI?
- Ben Pace 29 Aug 2025 15:57 UTC
  8 points
  1
  Parent
  It might not be disproof, but it would seem very relevant for readers to be aware of major failings of prominent moderates in the current environment e.g. when making choices about what strategies to enact or trust. (Probably you already agree with this.)
  - ryan_greenblatt 29 Aug 2025 16:24 UTC
    16 points
    7
    Parent
    I agree with this in principle, but think that doing a good job of noting major failings of prominent moderates in the current environment would look very different than Eliezer’s comment and requires something stronger than just giving examples of some moderates which seem incorrect to Eliezer.
    
    Another way to put this is that I think citing a small number of anecdotes in defense of a broader world view is a dangerous thing to do and not attaching this to the argument in the post is even more dangerous. I think it’s more dangerous when the description of the anecdotes is sneering and misleading. So, when using this epistemically dangerous tool, I think there is a higher burden of doing a good job which isn’t done here.
    
    On the specifics here, I think Carlsmith’s report is unrepresentative for a bunch of reasons. I think Bioanchors is representative (though I don’t think it looks fucking nuts in retrospect).
    
    This is putting aside the fact that this doesn’t engage with the arguments in the post at all beyond effectively reacting to the title.