evhub comments on Thane Ruthenis’s Shortform

evhub 5 Jun 2025 19:32 UTC
4 points
1

I think trying to make AIs be moral patients earlier pretty clearly increases AI takeover risk

How so? Seems basically orthogonal to me? And to the extent that it does matter for takeover risk, I’d expect the sorts of interventions that make it more likely that AIs are moral patients to also make it more likely that they’re aligned.

I think the most plausible views which care about shorter run patienthood mostly just want to avoid downside so they’d prefer no patienthood at all for now.

Even absent AI takeover, I’m quite worried about lock-in. I think we could easily lock in AIs that are or are not moral patients and have little ability to revisit that decision later, and I think it would be better to lock in AIs that are moral patients if we have to lock something in, since that opens up the possibility for the AIs to live good lives in the future.

I think it’s better to focus on AIs which we’d expect would have better preferences conditional on takeover

I agree that seems like the more important highest-order bit, but it’s not an argument that making AIs moral patients is bad, just that it’s not the most important thing to focus on (which I agree with).
- ryan_greenblatt 5 Jun 2025 19:46 UTC
  8 points
  6
  Parent
  I would have guessed that “making AIs be moral patients” looks like “make AIs have their own independent preferences/objectives which we intentionally don’t control precisely” which increases misalignment risks.
  
  At a more basic level, if AIs are moral patients, then there will be downsides for various safety measures and AIs would have plausible deniability for being opposed to safety measures. IMO, the right response to the AI taking a stand against your safety measures for AI welfare reasons is “Oh shit, either this AI is misaligned or it has welfare. Either way this isn’t what we wanted and needs to be addressed, we should train our AI differently to avoid this.”
  
  Even absent AI takeover, I’m quite worried about lock-in. I think we could easily lock in AIs that are or are not moral patients and have little ability to revisit that decision later
  
  I don’t understand, won’t all the value come from minds intentionally created for value rather than in the minds of the laborers? Also, won’t architecture and design of AIs radically shift after humans aren’t running day to day operations?
  
  I don’t understand the type of lock in your imagining, but it naively sounds like a world which has negligible longtermist value (because we got locked into obscure specifics like this), so making it somewhat better isn’t important.