ryan_greenblatt comments on Gradual Disempowerment: Concrete Research Projects

ryan_greenblatt 30 May 2025 15:02 UTC
13 points
5

The world currently seems to be aiming for a human-replacing AI agent regime, and this seems bad for a bunch of reasons. It would be great if people were fundamentally more oriented towards making AIs that complemented humans.

I don’t understand why you think this helps.

Presumably we still quickly reach a regime with full automation, so in the best case the work you describe means that either full automation comes a bit later (due to humans adding more value) or humans get more uplift (at all tasks?) prior to full automation.

Why is it helpful to generically make human labor more uplifted for some short period? (I can think of some reasons, but they all feel a bit weak to me.)
What links here?
- The best approaches for mitigating “the intelligence curse” (or gradual disempowerment); my quick guesses at the best object-level interventions by ryan_greenblatt (31 May 2025 18:20 UTC; 72 points)
- Raymond Douglas 30 May 2025 17:01 UTC
  11 points
  5
  Parent
  Sure, I agree we probably end up in full automation eventually by default. I also think this is much more relevant in some tasks than others: “generically make human labor more uplifted” doesn’t feel like it quite captures the thing I care about here.
  Some intuitions I have:
  - That period where AIs are more capable than human, but human+AI is even more capable, seems like a particularly crucial window for doing useful things, so extending it is pretty valuable. In particular, both bringing forward augmented human capability, and also pushing back human redundance.
    This is basically the main reason, and I don’t think I can guess why you’d disagree.
  - In parallel, I think that a lot of work is defaulting towards ‘fully general agent AI’ because it is an easy and natural target, not because it is the best one, and that if people knew what other kinds of interfaces to build for, that would actually suck some energy out of investing in getting long-term planning/drop-in replacements for everything as soon as possible
    This might be wrong for jevons paradox-y reasons though, and it depends on specifics I haven’t thought about
  - I kinda think that if we were doing more complementarity research, we’d have a larger dataset of healthy AI<>human interactions, and that could maybe help with steering us more towards the kinds of eventual AIs that are naturally friendly. I am pretty unsure here, but I do wish someone had thought hard about it. I weakly guess that I put a lot more weight than you on feedback loops from how people use AI.
  - The focus on independent/autonomous AIs is, I suspect, making people underinvest in figuring out what effect AI interactions have on humans, or on trying to make those effects good, and I can imagine this biting us hard down the line
    Like, if there were a nice suite of evals to tell you how emotionally healthy/toxic a given model was, then there would be a sort of legible target to hill climb towards. My guess is companies kind of don’t care enough to prioritise doing this themselves, but they’d take easy steps towards it.
  I should emphasise that I don’t think this is the all time top most important work, I just think it’s currently pretty neglected and I wouldn’t be surprised if there were some pretty interesting insights that came out of thinking hard about it for a while, or some pretty high leverage work available.
  - ryan_greenblatt 30 May 2025 17:19 UTC
    6 points
    2
    Parent
    
    That period where AIs are more capable than human, but human+AI is even more capable, seems like a particularly crucial window for doing useful things, so extending it is pretty valuable. In particular, both bringing forward augmented human capability, and also pushing back human redundance.
    
    In the context of risks from misalignment I often think about a type of interventions which is something like: “making AIs more useful earlier, at a point when their general abilities and opaque reasoning etc make them less dangerous (as in, less likely to be egregiously misalignment and less likely to be able to successfully subvert safety/security measures)”.
    
    I don’t quite see why this sort of thing (but for boosting human+AI rather than increasing the weak AI boost) helps much with gradual disempowerment. It seems to me like the forces you mention apply almost as much to cases where there is substantial uplift so long as AI companies still have a bunch of control and uplift isn’t widely diffused.
    
    But, I also don’t think I can hold a coherant model of gradual disempowerment which makes sense to me and matches the claims in the paper in my head, so I might just be missing some aspect of the situation.
    
    Regardless, probably not worth getting into it further.
    
    In parallel, I think that a lot of work is defaulting towards ‘fully general agent AI’ because it is an easy and natural target, not because it is the best one, and that if people knew what other kinds of interfaces to build for, that would actually suck some energy out of investing in getting long-term planning/drop-in replacements for everything as soon as possible
    
    I’m skeptical that making AIs more useful for uplifting humans will delay general purpose autonomous capabilities, because probably the most effective ways to get more uplift is to drive forward general capabilities which will transfer to both. Like, making AIs more useful for X will make more people work on X (rather than general capabilities) but will also make more people invest in AI in general.
    
    At a more basic level, the timelines slowing effect has to be pretty minor/marginal.
    
    Reasons 3 and 4 make sense to me in principle, though I don’t think I buy them in practice. Or at least they feel quite weak to me. This might again be downstream of me not being able to model gradual disempowerment in a way that makes sense to me.
    - Raymond Douglas 30 May 2025 22:42 UTC
      2 points
      0
      Parent
      Sure, briefly replying:
      On the first point: you’re right that this does in some ways make the problem worse; my current best guess is that it’s basically necessary for a solution. I’m planning to write this up in more detail some time soon and I hope to get your thoughts when I do!
      On the second: Yeah, I find this kind of thing pretty hard to be confident about. I could totally see you being right here, and I’d love for someone to think it through in detail.
      And I think the differences in 3 and 4 indeed probably come down to deeper assumptions that would be hard to unpick in this thread: I’d tentatively guess I’m putting more weight on the societal impacts of AI, and on the eventual shape of AGI/ASI being easier to affect.
      This comment thread probably isn’t the place, but if it ever seems like it would be important/feasible, I’d be happy to try to go deeper on where our models are differing.
  - Noosphere89 30 May 2025 17:20 UTC
    3 points
    0
    Parent
    
    In parallel, I think that a lot of work is defaulting towards ‘fully general agent AI’ because it is an easy and natural target, not because it is the best one, and that if people knew what other kinds of interfaces to build for, that would actually suck some energy out of investing in getting long-term planning/drop-in replacements for everything as soon as possible.
    
    I think the issue is that automating away humans is just a very large portion of the value of AI, and that 90% of automation away of tasks basically leads to 0 value being captured, due to the long tail:
    
    https://www.lesswrong.com/posts/Nbcs5Fe2cxQuzje4K/value-of-the-long-tail
    
    So unfortunately, I think human irrelevance is just more valuable than humans still being relevant.