So … SI is addressing the question of whether the “friendliness” concept is actually meaningful enough to be formalizable? SI accepts that “friendliness” might not be formalizable at all, and has discussed the possibility that mathematical proof is not even applicable in this case?
And SI has discussed the possibility that the current paradigm for an AI motivation mechanism is so poorly articulated, and so unproven (there being no such mechanism that has been demonstrated to be even approaching stability), that it may be meaningless to discuss how such motivation mechanisms can be proven to be “friendy”?
I do not believe I have seen any evidence of those debates/discussions coming from SI… do you have pointers?
Well, Luke has asked me to work on a document called “Mitigating Risks from AGI: Key Strategic Questions” which lists a number of questions we’d like to have answers to and attempts to list some preliminary pointers and considerations that would help other researchers actually answer those questions. “Can CEV be formalized?” and “How feasible is it to create Friendly AI along an Eliezer path?” are two of the questions in that document.
I haven’t heard explicit discussions about all of your points, but I would expect them to all have been brought up in private discussions (which I have for the most part missed, since my physical location is rather remote from all the other SI folks). Eliezer has said that a Friendly AI in the style that he is thinking of might just be impossible. That said, I do agree with the current general consensus among other SI folk, which is to say that we should act based on the assumption that such a mathematical proof is possible, because humanity’s chances of survival look pretty bad if it isn’t.
They’re currently working on a formal system for talking about stability, a reflective decision theory. If you wanted to prove that no such system can exist, what else would you be doing?
So … SI is addressing the question of whether the “friendliness” concept is actually meaningful enough to be formalizable? SI accepts that “friendliness” might not be formalizable at all, and has discussed the possibility that mathematical proof is not even applicable in this case?
And SI has discussed the possibility that the current paradigm for an AI motivation mechanism is so poorly articulated, and so unproven (there being no such mechanism that has been demonstrated to be even approaching stability), that it may be meaningless to discuss how such motivation mechanisms can be proven to be “friendy”?
I do not believe I have seen any evidence of those debates/discussions coming from SI… do you have pointers?
Well, Luke has asked me to work on a document called “Mitigating Risks from AGI: Key Strategic Questions” which lists a number of questions we’d like to have answers to and attempts to list some preliminary pointers and considerations that would help other researchers actually answer those questions. “Can CEV be formalized?” and “How feasible is it to create Friendly AI along an Eliezer path?” are two of the questions in that document.
I haven’t heard explicit discussions about all of your points, but I would expect them to all have been brought up in private discussions (which I have for the most part missed, since my physical location is rather remote from all the other SI folks). Eliezer has said that a Friendly AI in the style that he is thinking of might just be impossible. That said, I do agree with the current general consensus among other SI folk, which is to say that we should act based on the assumption that such a mathematical proof is possible, because humanity’s chances of survival look pretty bad if it isn’t.
They’re currently working on a formal system for talking about stability, a reflective decision theory. If you wanted to prove that no such system can exist, what else would you be doing?