Did anyone else immediately try to come up with ways Davis’ plan would fail? One obvious failure mode would be in specifying which dead people count—if you say “the people described in these books,” the AI could just grab the books and rewrite them. Hmm, come to think of it: is any attempt to pin down human preferences by physical reference rather than logical reference vulnerable to tampering of this kind, and therefore unworkable? I know EY has written many times before about a “giant logical function that computes morality”, but this puts that notion in a bit of a different light for me. Anyway, I’m sure there other less obvious ways Davis’ plan could go wrong too. I also suspect he’s sneaking a lot into that little word, “disapprove”.
In general though, I’m continually astounded at how many people, upon being introduced to the value loading problem and some of the pitfalls that “common-sense” approaches have, still say “Okay, but why couldn’t we just do [idea I came up with in five seconds]?”
One obvious failure mode would be in specifying which dead people count—if you say “the people described in these books,” the AI could just grab the books and rewrite them. Hmm, come to think of it: is any attempt to pin down human preferences by physical reference rather than logical reference vulnerable to tampering of this kind, and therefore unworkable?
Not as such, no. It’s a possible failure mode, similar to wireheading; but both of those are avoidable. You need to write the goal system in such a way that makes the AI care about the original referent, not any proxy that it looks at, but there’s no particular reason to think that’s impossible.
In general though, I’m continually astounded at how many people, upon being introduced to the value loading problem and some of the pitfalls that “common-sense” approaches have, still say “Okay, but why couldn’t we just do [idea I came up with in five seconds]?”
Davis massively underestimates the magnitude and importance of the moral questions we haven’t considered, which renders his approach unworkable.
I feel safer in the hands of a superintelligence who is guided by 2014 morality, or for that matter by 1700 morality, than in the hands of one that decides to consider the question for itself.
I don’t. Building a transhuman civilization is going to raise all sorts of issues that we haven’t worked out, and do so quickly. A large part of the possible benefits are going to be contingent on the controlling system becoming much better at answering moral questions than any individual humans are right now. I would be extremely surprised if we don’t end up losing at least one order of magnitude of utility to this approach, and it wouldn’t surprise me at all if it turns out to produce a hellish environment in short order. The cost is too high.
The superintelligence might rationally decide, like the King of Brobdingnag, that we humans are “the most pernicious race of little odious vermin that nature ever suffered to crawl upon the surface of the earth,” and that it would do well to exterminate us and replace us with some much more worthy species. However wise this decision, and however strongly dictated by the ultimate true theory of morality, I think we are entitled to object to it, and to do our best to prevent it.
I don’t understand what scenario he is envisioning, here. If (given sufficient additional information, intelligence, rationality and development time) we’d agree with the morality of this result, then his final statement doesn’t follow. If we wouldn’t, it’s a good old-fashioned Friendliness failure.
What do you think of Ernest Davis’ view? Is the value loading problem a problem?
Did anyone else immediately try to come up with ways Davis’ plan would fail? One obvious failure mode would be in specifying which dead people count—if you say “the people described in these books,” the AI could just grab the books and rewrite them. Hmm, come to think of it: is any attempt to pin down human preferences by physical reference rather than logical reference vulnerable to tampering of this kind, and therefore unworkable? I know EY has written many times before about a “giant logical function that computes morality”, but this puts that notion in a bit of a different light for me. Anyway, I’m sure there other less obvious ways Davis’ plan could go wrong too. I also suspect he’s sneaking a lot into that little word, “disapprove”.
In general though, I’m continually astounded at how many people, upon being introduced to the value loading problem and some of the pitfalls that “common-sense” approaches have, still say “Okay, but why couldn’t we just do [idea I came up with in five seconds]?”
Not as such, no. It’s a possible failure mode, similar to wireheading; but both of those are avoidable. You need to write the goal system in such a way that makes the AI care about the original referent, not any proxy that it looks at, but there’s no particular reason to think that’s impossible.
Agreed.
Davis massively underestimates the magnitude and importance of the moral questions we haven’t considered, which renders his approach unworkable.
I don’t. Building a transhuman civilization is going to raise all sorts of issues that we haven’t worked out, and do so quickly. A large part of the possible benefits are going to be contingent on the controlling system becoming much better at answering moral questions than any individual humans are right now. I would be extremely surprised if we don’t end up losing at least one order of magnitude of utility to this approach, and it wouldn’t surprise me at all if it turns out to produce a hellish environment in short order. The cost is too high.
I don’t understand what scenario he is envisioning, here. If (given sufficient additional information, intelligence, rationality and development time) we’d agree with the morality of this result, then his final statement doesn’t follow. If we wouldn’t, it’s a good old-fashioned Friendliness failure.