I’m with Steve on the idea that there’s a difference between broad human preferences (something like common sense?) and particular and exact human preferences (what would be needed for ambitious value learning).
Still, you (Stuart) made me realize that I didn’t think explicitly about this need for broad human preferences in my splitting of the problem (be able to align, then point to what we want), but it’s indeed implicit because I don’t care about being able to do “anything”, just the sort of things humans might want.
I’m with Steve on the idea that there’s a difference between broad human preferences (something like common sense?) and particular and exact human preferences (what would be needed for ambitious value learning).
Still, you (Stuart) made me realize that I didn’t think explicitly about this need for broad human preferences in my splitting of the problem (be able to align, then point to what we want), but it’s indeed implicit because I don’t care about being able to do “anything”, just the sort of things humans might want.