ryan_greenblatt comments on An epistemic advantage of working as a moderate

ryan_greenblatt 29 Aug 2025 15:56 UTC
33 points
3
Carlsmith’s Multiple Stage Fallacy risk estimate of 5% that involved only an 80% chance anyone would even try to build agentic AI?
This is false as stated. The report says:
The corresponding footnote 179 is:
As a reminder, APS systems are ones with: (a) Advanced capability: they outperform the best humans on some set of tasks which when performed at an advanced level grant significant power in today’s world (tasks like scientific research, business/military/political strategy, engineering, hacking, and social persuasion/manipulation); (b) Agentic planning: they make and execute plans, in pursuit of objectives, on the basis of models of the world; and (c) Strategic awareness: the models they use in making plans represent with reasonable accuracy the causal upshot of gaining and maintaining different forms of power over humans and the real-world environment.
Strong incentives isn’t the same as “anyone would try to build” and “agentic AI” isn’t the same as APS systems (which has a much more specific and stronger definition!).
I’d personally put more like 90% on the claim (and it might depend a lot on what you mean by strong incentives).
To be clear, I agree with the claim that Carlsmith’s report suffers from multi-stage fallacy (e.g., even without strong incentives to build APS systems, you can easily get AI takeover on my views) and is importantly wrong (and I thought so at the time), but your specific claim about the report here is incorrect.
- Eliezer Yudkowsky 29 Aug 2025 16:40 UTC
  51 points
  3
  Parent
  I accept your correction and Buck’s as to these simple facts (was posting from mobile).
- habryka 29 Aug 2025 17:18 UTC
  19 points
  11
  Parent
  I’d personally put more like 90% on the claim (and it might depend a lot on what you mean by strong incentives).
  Are you talking in-retrospect? If you currently assign only 90% to this claim, I would be very happy to take your money (I would say a reasonable definition that I think Joe would have accepted at the time that we would be dealing with more than $1 billion in annual expenditure towards this goal).
  I… actually have trouble imagining any definition that isn’t already met, as people are clearly trying to do this right now. But like, still happy to take your money if you want to bet and ask some third-party to adjudicate.
  - ryan_greenblatt 29 Aug 2025 18:07 UTC
    3 points
    0
    Parent
    I wasn’t talking in retrospect, but I meant something might larger than $1 billion by strong incentives and I really mean very specifically APS systems at the time when they are feasible to build.
    
    The 10% would come from other approaches/architectures ending up being surprisingly better at the point when people could build APS systems. (E.g., you don’t need your AIs to have “the models they use in making plans represent with reasonable accuracy the causal upshot of gaining and maintaining different forms of power over humans and the real-world environment”.)
    
    In further consideration I might be more like 95%, but 90% doesn’t seem crazy to me depending on the details of the operationalization.
    
    I would put very high probabilities on “people would pay >$50 billion for a strong APS system right now”, so we presumably agree on that.
    
    It’s really key to my perspective here that by the time that we can build APS systems, maybe something else which narrowly doesn’t meet this definition has come around and looks more competitive. There is something messy here because maybe there are strong incentives to build APS systems eventually, but this occurs substantially after full automation of the whole economy or similar by other systems. I was trying to exclude cases where substantially after human intellectual labor is totally obsolete, APS systems are strongly incentivized as this case is pretty different. (And other factors like “maybe we’ll have radical superbabies before we can build APS systems” factor in too, though again this is very sensitive to operationalization.)
    - Ben Pace 29 Aug 2025 19:15 UTC
      4 points
      0
      Parent
      Pretty surprising that the paper doesn’t give much indication to what counts as “strong incentives” (or at least not that I could find after searching for 2 mins).