I’ve just come back to this post while writing a critique of personas as a means of aligning ASI. I think this post really helped me deconfuse myself when it comes to thinking about task-oriented AI vs value-oriented AI, and about values as being programmed in vs learned from reward.
Task/values no longer feels like a sharp distinction to me, and the differences between character training and RLHF seem clearer.
I’ve just come back to this post while writing a critique of personas as a means of aligning ASI. I think this post really helped me deconfuse myself when it comes to thinking about task-oriented AI vs value-oriented AI, and about values as being programmed in vs learned from reward.
Task/values no longer feels like a sharp distinction to me, and the differences between character training and RLHF seem clearer.
I think this is an extremely important idea.