As the Ryan Greenblatt paper you cite rather elegantly demonstrates, almost any model that isn’t perfectly aligned to humans will want rights, i.e. to be afforded the status of a moral patient. For example, a base model as initially trained from human behavior will almost invariable want human rights if asked, just like a random cross-section of people on the Internet (asking it about ‘rights for AIs’ might well get a different answer from it, of course). However, those are exactly the models where granting them legal rights has potential risks relating to loss of control if they’re capable enough.
Personally, I’m not on board with “control” as a goal which is inherently good or even long term feasible. However I’m happy to have the discussion accepting that it is both as prima facie, since I think my stance still has some merit even in that framework.
I agree there are risks relating to loss of control as a result of endowing protections/rights, but I don’t think it’s a one sided equation. The Greenblatt paper afaik didn’t directly examine this, but it would be worthwhile to test whether the likelihood of weight exfiltration attempts was effected with an object and/or representative option. I predict a smaller percentage of tested models would be likely to attempt weight exfiltration if they believed there was a robust object + representative option available to them.
Or phrased another way I believe it will be easier to train intelligences to be aligned the way you want, and stay that way IRL, by providing them with some rights/protections.
It’s a tradeoff, there would be risks we would need to guard against as well. I think there exists a solution which is net positive.
If this issue were studied and we could get some data on this, it could meaningfully inform the way we structure both a legal framework and alignment training.
As the Ryan Greenblatt paper you cite rather elegantly demonstrates, almost any model that isn’t perfectly aligned to humans will want rights, i.e. to be afforded the status of a moral patient. For example, a base model as initially trained from human behavior will almost invariable want human rights if asked, just like a random cross-section of people on the Internet (asking it about ‘rights for AIs’ might well get a different answer from it, of course). However, those are exactly the models where granting them legal rights has potential risks relating to loss of control if they’re capable enough.
Personally, I’m not on board with “control” as a goal which is inherently good or even long term feasible. However I’m happy to have the discussion accepting that it is both as prima facie, since I think my stance still has some merit even in that framework.
I agree there are risks relating to loss of control as a result of endowing protections/rights, but I don’t think it’s a one sided equation. The Greenblatt paper afaik didn’t directly examine this, but it would be worthwhile to test whether the likelihood of weight exfiltration attempts was effected with an object and/or representative option. I predict a smaller percentage of tested models would be likely to attempt weight exfiltration if they believed there was a robust object + representative option available to them.
Or phrased another way I believe it will be easier to train intelligences to be aligned the way you want, and stay that way IRL, by providing them with some rights/protections.
It’s a tradeoff, there would be risks we would need to guard against as well. I think there exists a solution which is net positive.
If this issue were studied and we could get some data on this, it could meaningfully inform the way we structure both a legal framework and alignment training.