Few people, when learning their values in childhood, ended up considering examples such as this one and explicitly learning that they were wrong. Yet the persuasive power of that example comes from most people instantly reject the desirability of the dopamine drip scenario when it’s suggested to them.
I for one don’t “instantly reject” the desirability of this scenario. I think it’s a difficult philosophy problem as to whether dopamine drip is desirable or not. My worry is that either the AI will not be as uncertain as I am about it, or it will not handle or resolve the normative uncertainty in the same way as I would or should.
Today’s machine learning algorithms tend to be unreasonably certain (and wrong) about inputs very different from their training data, but that is perhaps just due to machine learning researchers currently focusing mostly on commercial settings where inputs are rarely very different from training data, and there aren’t terrible consequences for getting things wrong. So maybe we can expect this to improve in the future as researchers start to focus more on safety.
But even if we manage to build an AI that is properly uncertain about whether something like the dopamine drip scenario is good or bad, how do we get it to resolve its uncertainty in the right way, especially if its creators/owners are also uncertain or possibly wrong so it can’t just ask? Resolving the uncertainty incorrectly or getting the uncertainty permanently frozen into its utility function seem to be two big risks here. So I worry just as much about the reverse maverick nanny scenario, where we eventually, after centuries of philosophical progress, figure out that we actually do want to be put on dopamine drips, but the AI says “Sorry, I can’t let you do that.”
I for one don’t “instantly reject” the desirability of this scenario. I think it’s a difficult philosophy problem as to whether dopamine drip is desirable or not. My worry is that either the AI will not be as uncertain as I am about it, or it will not handle or resolve the normative uncertainty in the same way as I would or should.
Today’s machine learning algorithms tend to be unreasonably certain (and wrong) about inputs very different from their training data, but that is perhaps just due to machine learning researchers currently focusing mostly on commercial settings where inputs are rarely very different from training data, and there aren’t terrible consequences for getting things wrong. So maybe we can expect this to improve in the future as researchers start to focus more on safety.
But even if we manage to build an AI that is properly uncertain about whether something like the dopamine drip scenario is good or bad, how do we get it to resolve its uncertainty in the right way, especially if its creators/owners are also uncertain or possibly wrong so it can’t just ask? Resolving the uncertainty incorrectly or getting the uncertainty permanently frozen into its utility function seem to be two big risks here. So I worry just as much about the reverse maverick nanny scenario, where we eventually, after centuries of philosophical progress, figure out that we actually do want to be put on dopamine drips, but the AI says “Sorry, I can’t let you do that.”
Read about covariate shift. (More generally ML people are getting into systematic biases now, including causal inference, in a big way).
This has little to do with AGI, though.