Hmm but humans are not ruthless consequentialists, despite being consequentialist enough to be able to do all kinds of tasks and build civilization. So I don’t see how the Optimist’s argument is addressed.
Of course, evolution did go out of its way to make humans non-ruthless, by endowing us with social instincts. Maybe future AI programmers will likewise go out of their way to make ASIs non-ruthless? I hope so—but we need to figure out how.
A workable solution (to building stable non-ruthlessness within a powerful consequentialist framework like RL + model-based planning) probably exists, and I’m obviously working on it myself, and I think I’m making gradual progress, but I think the appropriate overall attitude right now is pessimism and panic about where we’re at. See “oh man, are we dropping this ball” section here & the three-part disjunction here.
(Why only “probably exists”? Because the human example is highly suggestive but not an airtight proof. For example, for all I know right now, maybe making a nice human requires a “training environment” that entails growing up with a human body, in a human community, at human speed. Doing that with AI is not really feasible in practice, for many reasons. There are other things like that too. Presumably further research will eventually either find a plan for non-ruthlessness + powerful capabilities in ASI, or a good argument that no plan exists, and I don’t currently have a very strong opinion on which one it would be.)
Hmm but humans are not ruthless consequentialists, despite being consequentialist enough to be able to do all kinds of tasks and build civilization. So I don’t see how the Optimist’s argument is addressed.
That’s this part:
A workable solution (to building stable non-ruthlessness within a powerful consequentialist framework like RL + model-based planning) probably exists, and I’m obviously working on it myself, and I think I’m making gradual progress, but I think the appropriate overall attitude right now is pessimism and panic about where we’re at. See “oh man, are we dropping this ball” section here & the three-part disjunction here.
(Why only “probably exists”? Because the human example is highly suggestive but not an airtight proof. For example, for all I know right now, maybe making a nice human requires a “training environment” that entails growing up with a human body, in a human community, at human speed. Doing that with AI is not really feasible in practice, for many reasons. There are other things like that too. Presumably further research will eventually either find a plan for non-ruthlessness + powerful capabilities in ASI, or a good argument that no plan exists, and I don’t currently have a very strong opinion on which one it would be.)