ryan_greenblatt comments on Thane Ruthenis’s Shortform

ryan_greenblatt 5 Jun 2025 17:24 UTC
20 points
12
IMO, it seems bad to intentionally try to build AIs which are moral patients until after we’ve resolved acute risks and we’re deciding what to do with the future longer term. (E.g., don’t try to build moral patient AIs until we’re sending out space probes or deciding what to do with space probes.) Of course, this doesn’t mean we’ll avoid building AIs which aren’t significant moral patients in practice because our control is very weak and commercial/power incentives will likely dominate.

I think trying to make AIs be moral patients earlier pretty clearly increases AI takeover risk and seems morally bad. (Views focused on non-person-affecting upside get dominated by the long run future, so these views don’t care about making moral patient AIs which have good lives in the short run. I think the most plausible views which care about shorter run patienthood mostly just want to avoid downside so they’d prefer no patienthood at all for now.)

The only upside is that it might increase value conditional on AI takeover. But, I think “are the AIs morally valuable themselves” is much less important than the preferences of these AIs from the perspective of longer run value conditional on AI takeover. So, I think it’s better to focus on AIs which we’d expect would have better preferences conditional on takeover and making AIs moral patients isn’t a particularly nice way to achieve this. Additionally, I don’t think we should put much weight on “try to ensure the preferences of AIs which were so misaligned they took over” because conditional on takeover we must have had very little control over preferences in practice.
- evhub 5 Jun 2025 19:32 UTC
  4 points
  1
  Parent
  
  I think trying to make AIs be moral patients earlier pretty clearly increases AI takeover risk
  
  How so? Seems basically orthogonal to me? And to the extent that it does matter for takeover risk, I’d expect the sorts of interventions that make it more likely that AIs are moral patients to also make it more likely that they’re aligned.
  
  I think the most plausible views which care about shorter run patienthood mostly just want to avoid downside so they’d prefer no patienthood at all for now.
  
  Even absent AI takeover, I’m quite worried about lock-in. I think we could easily lock in AIs that are or are not moral patients and have little ability to revisit that decision later, and I think it would be better to lock in AIs that are moral patients if we have to lock something in, since that opens up the possibility for the AIs to live good lives in the future.
  
  I think it’s better to focus on AIs which we’d expect would have better preferences conditional on takeover
  
  I agree that seems like the more important highest-order bit, but it’s not an argument that making AIs moral patients is bad, just that it’s not the most important thing to focus on (which I agree with).
  - ryan_greenblatt 5 Jun 2025 19:46 UTC
    8 points
    6
    Parent
    I would have guessed that “making AIs be moral patients” looks like “make AIs have their own independent preferences/objectives which we intentionally don’t control precisely” which increases misalignment risks.
    
    At a more basic level, if AIs are moral patients, then there will be downsides for various safety measures and AIs would have plausible deniability for being opposed to safety measures. IMO, the right response to the AI taking a stand against your safety measures for AI welfare reasons is “Oh shit, either this AI is misaligned or it has welfare. Either way this isn’t what we wanted and needs to be addressed, we should train our AI differently to avoid this.”
    
    Even absent AI takeover, I’m quite worried about lock-in. I think we could easily lock in AIs that are or are not moral patients and have little ability to revisit that decision later
    
    I don’t understand, won’t all the value come from minds intentionally created for value rather than in the minds of the laborers? Also, won’t architecture and design of AIs radically shift after humans aren’t running day to day operations?
    
    I don’t understand the type of lock in your imagining, but it naively sounds like a world which has negligible longtermist value (because we got locked into obscure specifics like this), so making it somewhat better isn’t important.