You’re assuming that “what humans mean” is well-defined. I’ve seen people criticize the example of an AI putting humans on a dopamine drip, on the grounds that “making people happy” clearly doesn’t mean that. But if your boss tells you to ‘make everyone happy,’ you will probably get paid to make everyone stop complaining. Parents in the real world used to give their babies opium and cocaine; advertisers today have probably convinced themselves that the foods and drugs they push genuinely make people happy. There is no existing mind that is provably Friendly.
So, this criticism is implying that simply understanding human speech will (at a minimum) let the AI understand moral philosophy, which is not trivial.
So, this criticism is implying that simply understanding human speech will (at a minimum) let the AI understand moral philosophy, which is not trivial.
I don’t disagree with the other stuff you said. But I interpreted the criticism as “an AI told to ‘do what humans want, not what they mean’” will have approximately the same effect as if you told a perfectly rational human being to do the same. So in the same way that I can instruct people with some success to “do what I mean”, the same will work for AI too. It’s just also true that this isn’t a solution to FAI any more than it is with humans—because morality is inconsistent, human beings are inherently unfriendly, etc...
You’re assuming that “what humans mean” is well-defined. I’ve seen people criticize the example of an AI putting humans on a dopamine drip, on the grounds that “making people happy” clearly doesn’t mean that. But if your boss tells you to ‘make everyone happy,’ you will probably get paid to make everyone stop complaining. Parents in the real world used to give their babies opium and cocaine; advertisers today have probably convinced themselves that the foods and drugs they push genuinely make people happy. There is no existing mind that is provably Friendly.
So, this criticism is implying that simply understanding human speech will (at a minimum) let the AI understand moral philosophy, which is not trivial.
I don’t disagree with the other stuff you said. But I interpreted the criticism as “an AI told to ‘do what humans want, not what they mean’” will have approximately the same effect as if you told a perfectly rational human being to do the same. So in the same way that I can instruct people with some success to “do what I mean”, the same will work for AI too. It’s just also true that this isn’t a solution to FAI any more than it is with humans—because morality is inconsistent, human beings are inherently unfriendly, etc...
I think you’re eliding the question of motive (which may be more alien for an AI). But I’m glad we agree on the main point.