Charlie Steiner comments on Charlie Steiner’s Shortform

Charlie Steiner 6 Dec 2023 22:18 UTC
2 points
Trying to get to a good future by building a helpful assistant seems less good than it did a month ago, because the risk is more salient that clever people in positions of power may coopt helpful assistants to amass even more power.
One security measure against this is reducing responsiveness to the user, and increasing the amount of goal information that’s put into to large finetuning datasets that have lots of human eyeballs on them.