TsviBT comments on Contradict my take on OpenPhil’s past AI beliefs

TsviBT 21 Dec 2025 17:56 UTC
16 points
1

(Remember that I only want to defend “worst form of timelines prediction except all the other approaches”. I agree this is kind of a crazy argument in some absolute sense.)

So, just so we’re on the same page abstractly: Would you agree that updating / investing “a lot” in an argument that’s kind of crazy in some absolute sense, would be an epistemic / strategic mistake, even if that argument is the best available specific argument in a relative sense?
- Rohin Shah 22 Dec 2025 8:42 UTC
  16 points
  5
  Parent
  Would you agree that updating / investing “a lot” in an argument that’s kind of crazy in some absolute sense, would be an epistemic / strategic mistake, even if that argument is the best available specific argument in a relative sense?
  Hmm, maybe? What exactly is the alternative?
  Some things that I think would usually be epistemic / strategic mistakes in this situation:
  - Directly adopting the resulting distribution
  - Not looking into other arguments
  - Taking actions that would be significantly negative in “nearby” worlds that the argument suggests are unlikely. (The “nearby” qualifier is to avoid problem-of-induction style issues.)
  Some things that I don’t think would immediately qualify as epistemic / strategic mistakes (of course they could still be mistakes depending on further details):
  - Making a large belief update. Generally it seems plausible that you start with some very inchoate opinions (or in Bayesian terms an uninformed prior) and so any argument that seems to have some binding to reality, even a pretty wild and error-prone one, can still cause a big update. (Related: The First Sample Gives the Most Information.)
    This is not quite how I felt about bio anchors in 2020 -- I do think there was some binding-to-reality in arguments like “make a wild guess about how far we are, and then extrapolate AI progress out and see when it reaches that point”—but it is close.
  - Taking consequential actions as a result of the argument. Ultimately we are often faced with consequential decisions, and inaction is also a choice. Obviously you should try to find actions that don’t depend on the particular axis you’re uncertain about (in this case timelines), but sometimes that would still leave significant potential value on the table. In that case you should use the best info available to you.
    I actually think this is one of the biggest strengths of Open Phil around 2016-2020. At that time, (1) the vast, vast majority of people believed that AGI was a long time away (especially as revealed by their actions, but also via stated beliefs), and (2) AI timelines was an incredibly cursed topic where there were few arguments that had any semblance of binding to reality. Nevertheless, they saw some arguments that had some tiny bit of binding to reality, concluded that there was a non-trivial chance of AGI within 2 decades, and threw a bunch of effort behind acting on that scenario because it would be so important if it happened.
    So imo they did invest “a lot” in the belief that AGI could be soon (though of course that wasn’t based just on bio anchors), and I think this looks like a great call in hindsight.
  - TsviBT 22 Dec 2025 9:37 UTC
    9 points
    2
    Parent
    Thanks!
    
    things that I don’t think would immediately qualify as epistemic / strategic mistakes
    
    Making a large belief update
    
    Taking consequential actions as a result of the argument
    
    We easily agree that this depends on further details, but just at this abstract level, I want to record the case that these “probably, usually” are mistakes. (I’m avoiding the object level because I’m not very invested in that discussion—I have opinions but they aren’t the result of lots of investigation on the specific topic of bioanchors or OP’s behavior; totally fair for you to therefore bow out / etc.)
    
    The case is like this:
    
    Suppose you have a Bayesian uninformed prior. Then someone makes an argument that’s “kinda crazy but the best we have”. What should happen? How does a “kinda crazy argument” cash out in terms of likelihood ratios? I’m not sure, and actually I don’t want to think of it in a simple Bayesian context; but one way would be to say: On the hypothesis that AGI comes at year X, we should see more good arguments for AGI coming at year X. When we see an argument for that, we update to thinking AGI comes at year X. How much? Well, if it’s a really good argument, we update a lot. If it’s a crazy argument, we don’t update much, because any Y predicts that there are plenty of crazy arguments for AGI coming at year Y.
    
    The way I actually want to think about the situation would be in terms of bounded rationality and abduction. The situation is more like, we start with pure “model uncertainty”, or in other words “we haven’t thought of most of the relevant hypotheses for how things actually work; we’re going off of a mush of weak guesses, analogies, and high-entropy priors over spaces that seem reasonable”. What happens when we think of a crazy model? It’s helpful, e.g. to stimulate further thinking which might lead to good hypotheses. But does it update our distributions much? I think in terms of probabilities, it looks like fleshing out one very unlikely hypothesis. Saying it’s “crazy” means it’s low probability of being (part of) the right world-description. Saying it’s “the best we have” means it’s the clearest model we have—the most fleshed-out hypothesis. Both of these can be true. But if you add an unlikely hypothesis, you don’t update the overall distribution much at all.
    - Rohin Shah 22 Dec 2025 10:01 UTC
      8 points
      3
      Parent
      I think I agree with all of that under the definitions you’re using (and I too prefer the bounded rationality version). I think in practice I was using words somewhat differently than you.
      (The rest of this comment is at the object level and is mostly for other readers, not for you)
      Saying it’s “crazy” means it’s low probability of being (part of) the right world-description.
      The “right” world-description is a very high bar (all models are wrong but some are useful), but if I go with the spirit of what you’re saying I think I might not endorse calling bio anchors “crazy” by this definition, I’d say more like “medium” probability of being a generally good framework for thinking about the domain, plus an expectation that lots of the specific details would change with more investigation.
      Honestly I didn’t have any really precise meaning by “crazy” in my original comment, I was mainly using it as a shorthand to gesture at the fact that the claim is in tension with reductionist intuitions, and also that the legibly written support for the claim is weak in an absolute sense.
      Saying it’s “the best we have” means it’s the clearest model we have—the most fleshed-out hypothesis.
      I meant a higher bar than this; more like “the most informative and relevant thing for informing your views on the topic” (beyond extremely basic stuff like observing that humanity can do science at all, or things like reference class priors). Like, I also claim it is better than “query your intuitions about how close we are to AGI, and how fast we are going, to come up with a time until we get to AGI”. So it’s not just the clearest / most fleshed-out, it’s also the one that should move you the most, even including various illegible or intuition-driven arguments. (Obviously scoped only to the arguments I know about; for all I know other people have better arguments that I haven’t seen.)
      If it were merely the clearest model or most fleshed-out hypothesis, I agree it would usually be a mistake to make a large belief update or take big consequential actions on that basis.
- Rohin Shah 22 Dec 2025 8:43 UTC
  8 points
  3
  Parent
  I also want to qualify / explain my statement about it being a crazy argument. The specific part that worries me (and Eliezer, iiuc) is the claim that, at a given point in time, the delta between natural and artificial artifacts will tend to be approximately constant across different domains. This is quite intuition-bending from a mechanistic / reductionist viewpoint, and the current support for it seems very small and fragile (this 8 page doc). However, I can see a path where I would believe in it much more, which would involve things like:
  - Pinning down the exact methodology: Which metrics are we making this “approximately constant delta” claim for? It’s definitely not arbitrary metrics.
    The doc has an explanation, but I don’t currently feel like I could go and replicate it myself while staying faithful to the original intuitions, so I currently feel like I am deferring to the authors of the doc on how they are choosing their metrics.
    Once we do this, can we explain how we’re applying it in the AI case? Why should we anchor brain size to neural net size, instead of e.g. brain training flops to neural net training flops?
  - More precise estimates: Iirc there is a lot of “we used this incredibly heuristic argument to pull out a number, it’s probably not off by OOMs so that’s fine for our purposes”, which I think is reasonable but makes me uneasy.
  - Get more data points: Surely there are more artifacts to compare.
  - Check for time invariance: Repeat the analysis at different points in time—do we see a similar “approximately constant gap” if we look at manmade artifacts from (say) 2000 or 1950? Equivalently, this theory predicts that the rate of progress on the chosen metrics should be similar across domains; is that empirically supported?
  - Flesh out a semi-mechanistic theory (that makes this prediction).
    The argument would be something like: even if you have two very very different optimization procedures (evolution vs human intelligence), as long as they are far from optimality, it is reasonable to model them via a single quantity (“optimization power” / “effective fraction of the search space covered”) which is the primary determinant of the performance you get, irrespective of what domain you are in (as long as the search space in the domain is sufficiently large / detailed / complicated). As a result, as long as you focus on metrics that both evolution and humans were optimizing, you should expect the difference in performance on the metric to be primarily a function of the difference in optimization power between evolution and humans-at-that-point-in-time, and to be approximately independent of the domain.
    Once you have such a theory, check whether the bio anchors application is sensible according to the theory.
  I anticipate someone asking the followup “why didn’t Open Phil do that, then?” I don’t know what Open Phil was thinking, but I don’t think I’d have made a very different decision. It’s a lot of work, not many people can do it, and many of those people had better things to do, e.g. imo the compute-centric takeoff work was indeed more important and caused bigger updates than I think the work above would have done (and was probably easier to do).