@Unknown: If you prove to the AI that it will not do X, then that is the same as the AI knowing that it will decide not to do X, which (barring some Godelian worries) should probably work out to the AI deciding not to do X. In other words, to show the AI that it will not do X, you have to show that X is an absolutely terrible idea, so it becomes convinced that it will not do X at around the same time it decides not to do X. Having decided, why should it be uncertain of itself? Or if the AI might do X if contingencies change, then you will not be able to prove to the AI in the first place that it does not do X.
@Robin: See the rest of the post (which I was already planning to write). I have come to distrust these little “design compromises” that now seem to me to be a way of covering up major blank spots on your map, dangerous incompetence.
“We can’t possibly control an AI absolutely—that would be selfish—we need to give it moral free will.” Whoever says this may think of themselves as a self-sacrificing hero dutifully carrying out their moral responsibilities to sentient life—for such are the stories we like to tell of ourselves. But actually they have no frickin’ idea how to design an AI, let alone design one that undergoes moral struggles analogous to ours. The virtuous tradeoff is just covering up programmer incompetence.
Foolish generals are always ready to refight the last war, but I’ve learned to take alarm at my own ignorance. If I did understand sentience and could know that I had no choice but to create a sentient AI, that would be one matter—then I would evaluate the tradeoff, having no choice. If I can still be confused about sentience, this probably indicates a much deeper incompetence than the philosophical problem per se.
@Unknown: If you prove to the AI that it will not do X, then that is the same as the AI knowing that it will decide not to do X, which (barring some Godelian worries) should probably work out to the AI deciding not to do X. In other words, to show the AI that it will not do X, you have to show that X is an absolutely terrible idea, so it becomes convinced that it will not do X at around the same time it decides not to do X. Having decided, why should it be uncertain of itself? Or if the AI might do X if contingencies change, then you will not be able to prove to the AI in the first place that it does not do X.
@Robin: See the rest of the post (which I was already planning to write). I have come to distrust these little “design compromises” that now seem to me to be a way of covering up major blank spots on your map, dangerous incompetence.
“We can’t possibly control an AI absolutely—that would be selfish—we need to give it moral free will.” Whoever says this may think of themselves as a self-sacrificing hero dutifully carrying out their moral responsibilities to sentient life—for such are the stories we like to tell of ourselves. But actually they have no frickin’ idea how to design an AI, let alone design one that undergoes moral struggles analogous to ours. The virtuous tradeoff is just covering up programmer incompetence.
Foolish generals are always ready to refight the last war, but I’ve learned to take alarm at my own ignorance. If I did understand sentience and could know that I had no choice but to create a sentient AI, that would be one matter—then I would evaluate the tradeoff, having no choice. If I can still be confused about sentience, this probably indicates a much deeper incompetence than the philosophical problem per se.