My point was that it doesn’t follow from that that G as they actually are will agree on V.
I’m talking about whether we are likely to endorse the result of extracting our actual values.
OOOOOOOOOOOOOOOOOOOOH. Ah. Ok. That is actually an issue, yes! Sorry I didn’t get what you meant before!
My answer is: that is an open problem, in the sense that we kind of need to know much more about neuro-ethics to answer it. It’s certainly easy to imagine scenarios in which, for instance, the FAI proposes to make all humans total moral exemplars, and as a result all the real humans who secretly like being sinful, even if they don’t endorse it, reject the deal entirely.
Yes, we have several different motivational systems, and the field of machine ethics tends to brush this under the rug by referring to everything as “human values” simply because the machine-ethics folks tend to contrast humans with paper-clippers to make a point about why machine-ethics experts are necessary.
This kind of thing is an example of the consideration that needs to be done to get somewhere. You are correct in saying that if FAI designers want their proposals to be accepted by the public (or even the general body of the educated elite) they need to cater not only to meta-level moral wishes but to actual desires and affections real people feel today. I would certainly argue this is an important component of Friendliness design.
At some point, we have to define a target of how much reflective equilibrium we expect from our input, and from our evaluators. The further we shift our target away from where we are right now, the more really stupid ideas we will wash out, and the less likely we are to endorse the result. The further we shift it towards where we are, the more stupid ideas we keep, and the more likely we are to endorse the result.
This assumes that people are unlikely to endorse smart ideas. I personally disagree: many ideas are difficult to locate in idea-space, but easy to evaluate. Life extension, for example, or marriage for romance.
Because back at the beginning of this conversation, it sounded like you were claiming you had in mind a process that was guaranteed not to fuck up, which is what I was skeptical about.
No, I have not solved AI Friendliness all on my lonesome. That would be a ridiculous claim, a crackpot sort of claim. I just have a bunch of research notes that, even with their best possible outcome, leave lots of open questions and remaining issues.
Now you might reply “Well it’s the best we can do!” and I might agree. As I said earlier, we simply have to accept that we might get it wrong, and do it anyway, because the probability of disaster if we don’t do it is even higher. But let’s not pretend there’s no chance of failure.
Certainly there’s a chance of failure. I just think there’s a lot we can and should do to reduce that chance. The potential rewards are simply too great not to.
OOOOOOOOOOOOOOOOOOOOH. Ah. Ok. That is actually an issue, yes! Sorry I didn’t get what you meant before!
My answer is: that is an open problem, in the sense that we kind of need to know much more about neuro-ethics to answer it. It’s certainly easy to imagine scenarios in which, for instance, the FAI proposes to make all humans total moral exemplars, and as a result all the real humans who secretly like being sinful, even if they don’t endorse it, reject the deal entirely.
Yes, we have several different motivational systems, and the field of machine ethics tends to brush this under the rug by referring to everything as “human values” simply because the machine-ethics folks tend to contrast humans with paper-clippers to make a point about why machine-ethics experts are necessary.
This kind of thing is an example of the consideration that needs to be done to get somewhere. You are correct in saying that if FAI designers want their proposals to be accepted by the public (or even the general body of the educated elite) they need to cater not only to meta-level moral wishes but to actual desires and affections real people feel today. I would certainly argue this is an important component of Friendliness design.
This assumes that people are unlikely to endorse smart ideas. I personally disagree: many ideas are difficult to locate in idea-space, but easy to evaluate. Life extension, for example, or marriage for romance.
No, I have not solved AI Friendliness all on my lonesome. That would be a ridiculous claim, a crackpot sort of claim. I just have a bunch of research notes that, even with their best possible outcome, leave lots of open questions and remaining issues.
Certainly there’s a chance of failure. I just think there’s a lot we can and should do to reduce that chance. The potential rewards are simply too great not to.