Misunderstandings, as you say, can have large consequences even when small.
But the point at issue is whether a system can make, not a small misunderstanding, but the mother of all misunderstandings. (Subsequent consequences are moot, here, whether they be trivial or enormous, because I am only interested in what kind of system makes such misunderstandings). The comparison with the Charge of the Light Brigade mistake doesn’t work because it is too small, so we need to create a fictional mistake and examine that, to get an idea of what is involved. Suppose that after Kennedy gave his Go To The Moon speech, NASA worked on the project for several years, and then finally delivered a small family car, sized for three astronauts, which was capable of driving from the launch complex in Florida all the way up country to the little township of Moon, PA.
Now, THAT would be a misunderstanding comparable to the one we are talking about. And my question would then be about the competence of the head of NASA. Would that person have been smart enough to organize his way out of a paper bag? Would you believe that in the real world, that person could (a) make that kind of mistake in understanding the meaning of the word “Moon” and yet (b) at the same time be able to run the entire NASA organization?
So, the significance of the words of mine that you quote, above, is that I do not believe that there is a single rational person who would believe that a mistake of that sort by a NASA administrator would be consistent with that person being smart enough to be in that job. In fact, most people would say that such a person would not be smart enough to tie their own shoelaces.
The rest of what you say contains several implicit questions and I cannot address all of them at this time, but I will say that the last paragraph does get my suggestion very, very wrong. It is about as far from what I tried to say in the paper, as it is possible to get. The AI has a massive number of constraints on its behavior, but they are all built in at the beginning, and in effect that cannot be changed because the change requires that the pre- state sanction the design of the post- state, and since the pre- state is safe at iteration n = 1, I can invoke the law of induction to conclude that it stays safe for all n > 1.
The consequences of a misunderstanding are not a function of the size of the misunderstanding. Rather, they are a consequence of the ability that the acting agent (in this case, the AI) has to influence the world. A superintelligent AI has an unprecedentedly huge ability to influence the world, therefore, in the worst case, the potential consequences of a misunderstanding are unprecedentedly huge.
The nature of the misunderstanding—whether small or large—makes little difference here. And the nature of the problem, that is, communicating with a non-human and thus (presumably) alien in nature artificial intelligence, is rife with the potential for misunderstanding.
Suppose that after Kennedy gave his Go To The Moon speech, NASA worked on the project for several years, and then finally delivered a small family car, sized for three astronauts, which was capable of driving from the launch complex in Florida all the way up country to the little township of Moon, PA.
Considering that an artificial intelligence—at least at first—might well have immense computational ability and massive intelligence but little to no actual experience in understanding what people mean instead of what they say, this is precisely the sort of misunderstanding that is possible if the only mission objective given to the system is something along the lines of “get three astronauts (human) to location designated (Moon)”. (Presumably, instead of waiting several years, it would take a few minutes to order a rental car instead—assuming it knew about rental cars, or thought to look for them).
Now, if the AI is capable of solving the difficult problem of separating out what people mean from what they say—which is a problem that human-level intelligences still have immense trouble with at times—and the AI is compassionate enough towards humanity to actually strongly value human happiness (as opposed to assigning approximately as much value to us as we assign to ants), then yes, you’ve done it, you’ve got a perfectly wonderful Friendly AI.
The problem is, getting those two things right is not simple. I don’t think your proposed structure guarantees either of those.
The rest of what you say contains several implicit questions and I cannot address all of them at this time, but I will say that the last paragraph does get my suggestion very, very wrong. It is about as far from what I tried to say in the paper, as it is possible to get.
I am not surprised. I am very familiar with the effect—often, what one person means when they write something is not what another person sees when they read it.
That paragraph is what I saw when I read our paper. I think that it is likely that our implicit assumptions about the structure of such an AI differ drastically—I suspect that you are not mentioning (or are only briefly hinting at) the parts you consider obvious, and I am not considering those parts obvious and therefore not seeing them present at all.
The AI has a massive number of constraints on its behavior, but they are all built in at the beginning,
This is incredibly important, and something that I did not see in your proposal. What is the nature of these constraints?
and in effect that cannot be changed because the change requires that the pre- state sanction the design of the post- state, and since the pre- state is safe at iteration n = 1, I can invoke the law of induction to conclude that it stays safe for all n > 1.
A properly Friendly AI will remain Friendly even as it improves its own intelligence. Agreed.
...I’m just not seeing how your proposed design brings us any closer to Friendliness.
Misunderstandings, as you say, can have large consequences even when small.
But the point at issue is whether a system can make, not a small misunderstanding, but the mother of all misunderstandings. (Subsequent consequences are moot, here, whether they be trivial or enormous, because I am only interested in what kind of system makes such misunderstandings). The comparison with the Charge of the Light Brigade mistake doesn’t work because it is too small, so we need to create a fictional mistake and examine that, to get an idea of what is involved. Suppose that after Kennedy gave his Go To The Moon speech, NASA worked on the project for several years, and then finally delivered a small family car, sized for three astronauts, which was capable of driving from the launch complex in Florida all the way up country to the little township of Moon, PA.
Now, THAT would be a misunderstanding comparable to the one we are talking about. And my question would then be about the competence of the head of NASA. Would that person have been smart enough to organize his way out of a paper bag? Would you believe that in the real world, that person could (a) make that kind of mistake in understanding the meaning of the word “Moon” and yet (b) at the same time be able to run the entire NASA organization?
So, the significance of the words of mine that you quote, above, is that I do not believe that there is a single rational person who would believe that a mistake of that sort by a NASA administrator would be consistent with that person being smart enough to be in that job. In fact, most people would say that such a person would not be smart enough to tie their own shoelaces.
The rest of what you say contains several implicit questions and I cannot address all of them at this time, but I will say that the last paragraph does get my suggestion very, very wrong. It is about as far from what I tried to say in the paper, as it is possible to get. The AI has a massive number of constraints on its behavior, but they are all built in at the beginning, and in effect that cannot be changed because the change requires that the pre- state sanction the design of the post- state, and since the pre- state is safe at iteration n = 1, I can invoke the law of induction to conclude that it stays safe for all n > 1.
The consequences of a misunderstanding are not a function of the size of the misunderstanding. Rather, they are a consequence of the ability that the acting agent (in this case, the AI) has to influence the world. A superintelligent AI has an unprecedentedly huge ability to influence the world, therefore, in the worst case, the potential consequences of a misunderstanding are unprecedentedly huge.
The nature of the misunderstanding—whether small or large—makes little difference here. And the nature of the problem, that is, communicating with a non-human and thus (presumably) alien in nature artificial intelligence, is rife with the potential for misunderstanding.
Considering that an artificial intelligence—at least at first—might well have immense computational ability and massive intelligence but little to no actual experience in understanding what people mean instead of what they say, this is precisely the sort of misunderstanding that is possible if the only mission objective given to the system is something along the lines of “get three astronauts (human) to location designated (Moon)”. (Presumably, instead of waiting several years, it would take a few minutes to order a rental car instead—assuming it knew about rental cars, or thought to look for them).
Now, if the AI is capable of solving the difficult problem of separating out what people mean from what they say—which is a problem that human-level intelligences still have immense trouble with at times—and the AI is compassionate enough towards humanity to actually strongly value human happiness (as opposed to assigning approximately as much value to us as we assign to ants), then yes, you’ve done it, you’ve got a perfectly wonderful Friendly AI.
The problem is, getting those two things right is not simple. I don’t think your proposed structure guarantees either of those.
I am not surprised. I am very familiar with the effect—often, what one person means when they write something is not what another person sees when they read it.
That paragraph is what I saw when I read our paper. I think that it is likely that our implicit assumptions about the structure of such an AI differ drastically—I suspect that you are not mentioning (or are only briefly hinting at) the parts you consider obvious, and I am not considering those parts obvious and therefore not seeing them present at all.
This is incredibly important, and something that I did not see in your proposal. What is the nature of these constraints?
A properly Friendly AI will remain Friendly even as it improves its own intelligence. Agreed.
...I’m just not seeing how your proposed design brings us any closer to Friendliness.