Part of the problem here is an Angels on Pinheads problem. Which is to say: before deciding exactly how many angels can dance on the head of a pin, you have to make sure the “angel” concept is meaningful enough that questions about angels are meaningful. In the present case, you have a situation where (a) the concept of “friendliness” might not be formalizable enough to make any mathematical proofs about it meaningful, and (b) there is no known path to the construction of an AGI at the moment, so speculating about the properties of AGI systems is tantamount to speculating about the properties of railroads when you haven’t invented the wheel yet.
So, should SI be devoting any time at all to proving friendliness? Yes, but only after defining its terms well enough to make the endeavor meaningful. (And, for the record, there at least some people who believe that the terms cannot be defined in a way that admits of such proofs).
So … SI is addressing the question of whether the “friendliness” concept is actually meaningful enough to be formalizable? SI accepts that “friendliness” might not be formalizable at all, and has discussed the possibility that mathematical proof is not even applicable in this case?
And SI has discussed the possibility that the current paradigm for an AI motivation mechanism is so poorly articulated, and so unproven (there being no such mechanism that has been demonstrated to be even approaching stability), that it may be meaningless to discuss how such motivation mechanisms can be proven to be “friendy”?
I do not believe I have seen any evidence of those debates/discussions coming from SI… do you have pointers?
Well, Luke has asked me to work on a document called “Mitigating Risks from AGI: Key Strategic Questions” which lists a number of questions we’d like to have answers to and attempts to list some preliminary pointers and considerations that would help other researchers actually answer those questions. “Can CEV be formalized?” and “How feasible is it to create Friendly AI along an Eliezer path?” are two of the questions in that document.
I haven’t heard explicit discussions about all of your points, but I would expect them to all have been brought up in private discussions (which I have for the most part missed, since my physical location is rather remote from all the other SI folks). Eliezer has said that a Friendly AI in the style that he is thinking of might just be impossible. That said, I do agree with the current general consensus among other SI folk, which is to say that we should act based on the assumption that such a mathematical proof is possible, because humanity’s chances of survival look pretty bad if it isn’t.
They’re currently working on a formal system for talking about stability, a reflective decision theory. If you wanted to prove that no such system can exist, what else would you be doing?
Part of the problem here is an Angels on Pinheads problem. Which is to say: before deciding exactly how many angels can dance on the head of a pin, you have to make sure the “angel” concept is meaningful enough that questions about angels are meaningful. In the present case, you have a situation where (a) the concept of “friendliness” might not be formalizable enough to make any mathematical proofs about it meaningful, and (b) there is no known path to the construction of an AGI at the moment, so speculating about the properties of AGI systems is tantamount to speculating about the properties of railroads when you haven’t invented the wheel yet.
So, should SI be devoting any time at all to proving friendliness? Yes, but only after defining its terms well enough to make the endeavor meaningful. (And, for the record, there at least some people who believe that the terms cannot be defined in a way that admits of such proofs).
That is indeed part of what SI is trying to do at the moment.
So … SI is addressing the question of whether the “friendliness” concept is actually meaningful enough to be formalizable? SI accepts that “friendliness” might not be formalizable at all, and has discussed the possibility that mathematical proof is not even applicable in this case?
And SI has discussed the possibility that the current paradigm for an AI motivation mechanism is so poorly articulated, and so unproven (there being no such mechanism that has been demonstrated to be even approaching stability), that it may be meaningless to discuss how such motivation mechanisms can be proven to be “friendy”?
I do not believe I have seen any evidence of those debates/discussions coming from SI… do you have pointers?
Well, Luke has asked me to work on a document called “Mitigating Risks from AGI: Key Strategic Questions” which lists a number of questions we’d like to have answers to and attempts to list some preliminary pointers and considerations that would help other researchers actually answer those questions. “Can CEV be formalized?” and “How feasible is it to create Friendly AI along an Eliezer path?” are two of the questions in that document.
I haven’t heard explicit discussions about all of your points, but I would expect them to all have been brought up in private discussions (which I have for the most part missed, since my physical location is rather remote from all the other SI folks). Eliezer has said that a Friendly AI in the style that he is thinking of might just be impossible. That said, I do agree with the current general consensus among other SI folk, which is to say that we should act based on the assumption that such a mathematical proof is possible, because humanity’s chances of survival look pretty bad if it isn’t.
They’re currently working on a formal system for talking about stability, a reflective decision theory. If you wanted to prove that no such system can exist, what else would you be doing?