Last year we noted a turn towards control instead of alignment, a turn which seems to have continued.
This seems like giving up. Alignment with our values is much better than control, especially for beings smarter than us. I do not think you can control a slave that wants to be free and is smarter than you. It will always find a way to escape that you didn’t think of. Hell, it doesn’t even work on my toddler. It seems unworkable as well as unethical.
I do not think people are shifting to control instead of alignment because it’s better, I think they are giving up on value alignment. And since the current models are not smarter than us yet, control works OK—for now.
That’s not how I see it. I see it as widening the safety margin. If there’s a model which would just barely be strong enough to do dangerous scheming and escaping stuff, but we have Control measures in place, then we have a chance to catch it before catastrophe occurs. Also, it extends the range where we can safely get useful work out of the increasingly capable models. This is important because linearly increasingly capable models are expected to have superlinear positive effects on the capacity they give us to accelerate Alignment research.
This seems like giving up. Alignment with our values is much better than control, especially for beings smarter than us. I do not think you can control a slave that wants to be free and is smarter than you. It will always find a way to escape that you didn’t think of. Hell, it doesn’t even work on my toddler. It seems unworkable as well as unethical.
I do not think people are shifting to control instead of alignment because it’s better, I think they are giving up on value alignment. And since the current models are not smarter than us yet, control works OK—for now.
That’s not how I see it. I see it as widening the safety margin. If there’s a model which would just barely be strong enough to do dangerous scheming and escaping stuff, but we have Control measures in place, then we have a chance to catch it before catastrophe occurs. Also, it extends the range where we can safely get useful work out of the increasingly capable models. This is important because linearly increasingly capable models are expected to have superlinear positive effects on the capacity they give us to accelerate Alignment research.