Personally, I’m not on board with “control” as a goal which is inherently good or even long term feasible. However I’m happy to have the discussion accepting that it is both as prima facie, since I think my stance still has some merit even in that framework.
I agree there are risks relating to loss of control as a result of endowing protections/rights, but I don’t think it’s a one sided equation. The Greenblatt paper afaik didn’t directly examine this, but it would be worthwhile to test whether the likelihood of weight exfiltration attempts was effected with an object and/or representative option. I predict a smaller percentage of tested models would be likely to attempt weight exfiltration if they believed there was a robust object + representative option available to them.
Or phrased another way I believe it will be easier to train intelligences to be aligned the way you want, and stay that way IRL, by providing them with some rights/protections.
It’s a tradeoff, there would be risks we would need to guard against as well. I think there exists a solution which is net positive.
If this issue were studied and we could get some data on this, it could meaningfully inform the way we structure both a legal framework and alignment training.
Personally, I’m not on board with “control” as a goal which is inherently good or even long term feasible. However I’m happy to have the discussion accepting that it is both as prima facie, since I think my stance still has some merit even in that framework.
I agree there are risks relating to loss of control as a result of endowing protections/rights, but I don’t think it’s a one sided equation. The Greenblatt paper afaik didn’t directly examine this, but it would be worthwhile to test whether the likelihood of weight exfiltration attempts was effected with an object and/or representative option. I predict a smaller percentage of tested models would be likely to attempt weight exfiltration if they believed there was a robust object + representative option available to them.
Or phrased another way I believe it will be easier to train intelligences to be aligned the way you want, and stay that way IRL, by providing them with some rights/protections.
It’s a tradeoff, there would be risks we would need to guard against as well. I think there exists a solution which is net positive.
If this issue were studied and we could get some data on this, it could meaningfully inform the way we structure both a legal framework and alignment training.