As another example, I’ve seen people imagine non-consequentialist preferences as “rules that the AI grudgingly follows, while searching for loopholes”, rather than “preferences that the AI enthusiastically applies its intelligence towards pursuing”.
I imagine that this might be yet another view that is downstream of visualizing a “train then deploy” paradigm for future AI systems?
If the human operators successfully install some static deontological constraints in the AI, while also training it to accomplish consequentialist goals, there’s a continual training incentive to learn to game and to route around the deontological constrains.
Another way to say this: There are tradeoffs between the consequentialist and non-consequentialist desires, current AIs are only reinforced on the basis of behavioral outcomes (which are served better by consequentialist desires than non-consequentialist?) so training tends to gradually nudge the AIs towards having consequentialist goals.
I imagine that this might be yet another view that is downstream of visualizing a “train then deploy” paradigm for future AI systems?
If the human operators successfully install some static deontological constraints in the AI, while also training it to accomplish consequentialist goals, there’s a continual training incentive to learn to game and to route around the deontological constrains.
Another way to say this: There are tradeoffs between the consequentialist and non-consequentialist desires, current AIs are only reinforced on the basis of behavioral outcomes (which are served better by consequentialist desires than non-consequentialist?) so training tends to gradually nudge the AIs towards having consequentialist goals.
Oh, that’s your very next point! : P