I’m much less negative on alignment/control efforts, and now think the default path is probably that we aren’t doomed with 75-80% probability, largely due to realizing that the assumption of complexity of value/huge inductive biases were necessary for human values was more or less false, and control is finally starting up, and got some useful results, though we will need to see and wait for more results, and I credit Quintin Pope for starting this transformation, which makes the top comment weird, but also I do think something like this is still true, and I unretracted my comment above, with the context in this comment.
I’m much less negative on alignment/control efforts, and now think the default path is probably that we aren’t doomed with 75-80% probability, largely due to realizing that the assumption of complexity of value/huge inductive biases were necessary for human values was more or less false, and control is finally starting up, and got some useful results, though we will need to see and wait for more results, and I credit Quintin Pope for starting this transformation, which makes the top comment weird, but also I do think something like this is still true, and I unretracted my comment above, with the context in this comment.