Rohin Shah comments on Contradict my take on OpenPhil’s past AI beliefs

Rohin Shah 22 Dec 2025 8:43 UTC
8 points
3
I also want to qualify / explain my statement about it being a crazy argument. The specific part that worries me (and Eliezer, iiuc) is the claim that, at a given point in time, the delta between natural and artificial artifacts will tend to be approximately constant across different domains. This is quite intuition-bending from a mechanistic / reductionist viewpoint, and the current support for it seems very small and fragile (this 8 page doc). However, I can see a path where I would believe in it much more, which would involve things like:
- Pinning down the exact methodology: Which metrics are we making this “approximately constant delta” claim for? It’s definitely not arbitrary metrics.
  - The doc has an explanation, but I don’t currently feel like I could go and replicate it myself while staying faithful to the original intuitions, so I currently feel like I am deferring to the authors of the doc on how they are choosing their metrics.
  - Once we do this, can we explain how we’re applying it in the AI case? Why should we anchor brain size to neural net size, instead of e.g. brain training flops to neural net training flops?
- More precise estimates: Iirc there is a lot of “we used this incredibly heuristic argument to pull out a number, it’s probably not off by OOMs so that’s fine for our purposes”, which I think is reasonable but makes me uneasy.
- Get more data points: Surely there are more artifacts to compare.
- Check for time invariance: Repeat the analysis at different points in time—do we see a similar “approximately constant gap” if we look at manmade artifacts from (say) 2000 or 1950? Equivalently, this theory predicts that the rate of progress on the chosen metrics should be similar across domains; is that empirically supported?
- Flesh out a semi-mechanistic theory (that makes this prediction).
  - The argument would be something like: even if you have two very very different optimization procedures (evolution vs human intelligence), as long as they are far from optimality, it is reasonable to model them via a single quantity (“optimization power” / “effective fraction of the search space covered”) which is the primary determinant of the performance you get, irrespective of what domain you are in (as long as the search space in the domain is sufficiently large / detailed / complicated). As a result, as long as you focus on metrics that both evolution and humans were optimizing, you should expect the difference in performance on the metric to be primarily a function of the difference in optimization power between evolution and humans-at-that-point-in-time, and to be approximately independent of the domain.
  - Once you have such a theory, check whether the bio anchors application is sensible according to the theory.
I anticipate someone asking the followup “why didn’t Open Phil do that, then?” I don’t know what Open Phil was thinking, but I don’t think I’d have made a very different decision. It’s a lot of work, not many people can do it, and many of those people had better things to do, e.g. imo the compute-centric takeoff work was indeed more important and caused bigger updates than I think the work above would have done (and was probably easier to do).