ryan_greenblatt comments on Winning the power to lose

ryan_greenblatt 21 May 2025 23:15 UTC
13 points
2

This intuition is also informed by my personal assessment of the contributions LW-style theoretical research has made toward making existing AI systems safe—which, as far as I can tell, has been almost negligible (though I’m not implying that all safety research is similarly ineffective or useless).

I know what you mean by “LW-style theoretical research” (edit: actually, not that confident I know what you mean, see thread below), but it’s worth noting that right now on LW people appear to be much more into empirical research than theoretical research. Concretely, go to All posts in 2024 sorted by Top and then filter by AI. Out of the top 32 posts, 0 are theoretical research and roughly ⁷⁄₃₂ are empirical research. 1 or 2 out of 32 are discussion which is relatively pro-theoretical research and a bunch more (maybe 20) are well described as AI futurism or discussion of what research directions or safety strategies are best which is relatively focused on empirical approaches. LW has basically given up on LW-style theoretical research based on the top 32 posts. (One of the top 32 posts is actually a post which is arguably complaining about how the field of alignment has given up on LW-style theoretical research!)

Separately, I don’t think pessimism about LW-style theoretical research has a clear cut effect on how you should feel about a pause. The more you’re skeptical of work done in advance, the more you should think that additional work at done when we have more powerful AIs is a higher fraction of the action. This could be outweighed by generally being more skeptical about the returns to safety research as informed by this example subfield of safety research being poor, but still.

Also, it’s worth noting that almost everyone in the field is pessimistic about LW-style theoretical research! This isn’t a very controversial view. The main disagreements (at least on LW) tend to be more about how optimistic you are about empirical research and about different types of empirical research.
- habryka 21 May 2025 23:24 UTC
  17 points
  14
  Parent
  (I will go on the record that I think this comment seems to me terribly confused about what “LW style theoretic research” is. In-particular, I think of Redwood as one of the top organizations doing LW style theoretic research, with a small empirical component, and so clearly some kind of mismatch about concepts is going on here. AI 2027 also strikes me as very centrally the kind of “theoretical” thinking that characterizes LW.
  My sense is some kind of weird thing is happening where people conjure up some extremely specific thing as the archetype of LW-style research, in ways that is kind of disconnected from reality, and I would like to avoid people forming annoyingly hard to fix stereotypes as a result of that)
  - ryan_greenblatt 22 May 2025 0:01 UTC
    3 points
    0
    Parent
    I’m using the word “theoretical” more narrowly than you and not including conceptual/AI-futurism research. I agree the word “theoretical” is underdefined and there is a reasonable category that includes Redwood and AI 2027 which you could call theoretical research, I’d just typically use a different term for this and I don’t think Matthew was including this.
    
    I was trying to discuss what I thought Matthew was pointing at, I could be wrong about this of course.
    
    (Similarly, I’d guess that Matthew wouldn’t have counted Epoch’s work on takeoff speeds and what takeoff looks like as an example of “LW-style theoretical research”, but I think this work is very structurally/methodologically similar to stuff like AI 2027.”)
    
    If Matthew said “LW-style conceptual/non-empirical research” I would have interpreted this pretty differently.