I think the big reason this sort of scheme fails is because of the costs of niceness from the AI, because it requires you to restrain yourself from optimizing computational costs, and in particular adding more constraints (like the earth being habitable for humans to live on) fundamentally makes the trade go net-negative fast.
I believe Nate Soares’s argument is invalid, but I do think the conclusion (that we are likely to have billions of deaths to a future existential catastrophe conditional on unaligned AI, and decision theory doesn’t save us) is correct.
I think the big reason this sort of scheme fails is because of the costs of niceness from the AI, because it requires you to restrain yourself from optimizing computational costs, and in particular adding more constraints (like the earth being habitable for humans to live on) fundamentally makes the trade go net-negative fast.
More here:
https://www.lesswrong.com/posts/xvBZPEccSfM8Fsobt/what-are-the-best-arguments-for-against-ais-being-slightly#wy9cSASwJCu7bjM6H
I believe Nate Soares’s argument is invalid, but I do think the conclusion (that we are likely to have billions of deaths to a future existential catastrophe conditional on unaligned AI, and decision theory doesn’t save us) is correct.