MichaelDickens comments on The case for AI safety capacity-building work

MichaelDickens 10 Mar 2026 4:12 UTC
8 points
0
Loosely construed, the foundings of DeepMind, OpenAI, and Anthropic were the result of x-risk capacity-building. If you count those (and it’s not clear that you should), then capacity-building has probably been strongly net negative to date.
- abergal 11 Mar 2026 2:09 UTC
  5 points
  4
  Parent
  I’ve heard variants of this argument, and I overall haven’t found them that persuasive for reasons close to the ones Habryka gives—I think if you carve things up such that capacity-building work has been responsible for e.g. speeding up the creation of frontier AI labs, you should also credit it for the broader movement focused on catastrophic risks, and my intuition is the counterfactual world without that movement would be worse off overall. I don’t buy that the acceleration has been substantial enough that the unaccelerated world would have bought a lot of time for societal improvements useful for addressing catastrophic risks; instead, it feels to me like the unaccelerated world would be facing these risks more blindly and with less time to usefully prepare.
  I also think given the massive amount of non-GCR-related interest and resources in AI now, the forward-looking acceleration effects seem likely to be much smaller than any historic effect. I generally think the ratio of “meaningfully adding to the talent pool of people working on catastrophic risks” to “meaningfully accelerating AI capabilities” for most CB programs will look extremely favorable.
- habryka 10 Mar 2026 4:30 UTC
  3 points
  0
  Parent
  I don’t think it’s obvious that even if you count those, capacity-building has been strongly net-negative to date, but I do think it’s pretty plausible.
  Like, if you were to count the costs as broadly as “all the labs are downstream of capacity-build work” then you also need to count the benefits as broadly. And a broadly known public track record of being concerned about these problems for a long time, and that it’s motivated by altruism, and that you tried to solve the problem for a long time, and being one of the few memetic centers in the world that people draw on to figure out what to do about this whole AI situation is quite valuable, possibly more valuable than the acceleration-effects of things like Deepmind, OpenAI and Anthropic.
  (that said, my actual take here is that the biggest issue with most capacity-building work is that it actively undermines the things that other capacity-building work has been highly successful at, so that ultimately some capacity building work is predictably extremely good for the world, and some is predictably extremely bad for the world).
  - testingthewaters 10 Mar 2026 20:33 UTC
    2 points
    0
    Parent
    
    the biggest issue with most capacity-building work is that it actively undermines the things that other capacity-building work has been highly successful at, so that ultimately some capacity building work is predictably extremely good for the world, and some is predictably extremely bad for the world
    
    Wait what does this mean? Is there some kind of dichotomy I’m not aware of?
    - habryka 10 Mar 2026 20:36 UTC
      2 points
      −1
      Parent
      Maybe? I am not saying the dichotomy is common-knowledge, but I feel pretty confident predicting which capacity-building work will be quite bad in-expectation and which will be quite good (this doesn’t mean there isn’t variance within those categories with many orgs or people having sign-flipped impact from their reference class, but that I am happy to register predictions at the class level with like reasonably-high confidence).
      - testingthewaters 10 Mar 2026 23:12 UTC
        2 points
        0
        Parent
        I would then like to know which is which (DM is okay if you feel that would be somewhat controversial, it’s also alright if you want to keep your opinions to yourself)
        habryka 10 Mar 2026 23:44 UTC
        2 points
        0
        Parent
        Sorry, I am not saying there is a classifier here that is like one sentence long. At a high level I think “is it largely funneling people into places where the incentives will point towards building more powerful AI systems and/or becoming personally more powerful, or is it putting people into positions where their primary incentives are to help other people make sense of what is going with some grounding in accuracy of their beliefs” is the best short classifier I have, but I didn’t intend to communicate there is some super short description of the classifier!
        testingthewaters 11 Mar 2026 1:29 UTC
        2 points
        0
        Parent
        No worries, thanks for elaborating