Trying to get to a good future by building a helpful assistant seems less good than it did a month ago, because the risk is more salient that clever people in positions of power may coopt helpful assistants to amass even more power.
One security measure against this is reducing responsiveness to the user, and increasing the amount of goal information that’s put into to large finetuning datasets that have lots of human eyeballs on them.
Trying to get to a good future by building a helpful assistant seems less good than it did a month ago, because the risk is more salient that clever people in positions of power may coopt helpful assistants to amass even more power.
One security measure against this is reducing responsiveness to the user, and increasing the amount of goal information that’s put into to large finetuning datasets that have lots of human eyeballs on them.