The idea of provably safe AGI is typically presented as something that would exist within mathematical computation theory or some variant thereof. So that’s one obvious limitation of the idea: mathematical computers don’t exist in the real world, and real-world physical computers must be interpreted in terms of the laws of physics, and humans’ best understanding of the “laws” of physics seems to radically change from time to time. So even if there were a design for provably safe real-world AGI, based on current physics, the relevance of the proof might go out the window when physics next gets revised.
I didn’t get the impression that Eliezer’s goal was to “build a provably Friendly AI” (in the mathematical sense of “provable”), as Ben puts it. The impression I get is more that Eliezer wants to put off building an AI until we understand enough about morality and human values. Eliezer also cares about mathematical proofs, but more for the purpose of preserving values under self-modification (something that humans don’t usually have to deal with).
As an analogy, imagine you’re trying to debug some complex and badly written code you were previously unfamiliar with. One approach is to find the bit in the code that seems related to the bug, and modify it locally ( “if DatabaseDown() return False” and the like) until the issue seems fixed. Another approach is to try to understand how the program works to the point where you understand which conceptual mistake caused the bug, and see the right way to fix it.
The second approach takes more time but is also less likely to create another bug somewhere else, or to deteriorate the overall quality of the code. I think most programmers who’ve worked on sufficiently large codebases have seen examples of both approaches.
Anyway, I get the impression that Eliezer is advocating something like the second approach here (understand how everything works before implementing), and that Ben is describing that as “proving correctness”, which seems to be quite different (and much stronger!).
“Programmers operating with strong insight into intelligence, directly create along an efficient and planned pathway, a mind capable of modifying itself with deterministic precision—provably correct or provably noncatastrophic self-modifications. This is the only way I can see to achieve narrow enough targeting to create a Friendly AI.”
Yes, that’s what I was referring to when saying this:
Eliezer also cares about mathematical proofs, but more for the purpose of preserving values under self-modification (something that humans don’t usually have to deal with).
The provability here has to do with the AI proving to itself that modifying itself will preserve it’s values (or not cause it to self-destruct or wirehead or whatever), not the designers proving the AI is non-dangerous.
I.e. friendly as “provably non-dangerous AGI” doesn’t necessarily mean having a rigorous mathematical proof that the AI is not dangerous; but “merely” having enough understanding of morality when building it (as opposed to some high-level notions whose components haven’t been rigorously analyzed).
The impression I get is more that Eliezer wants to put off building an AI until we understand enough about morality and human values.
Seems slightly off to me. I think EY argues that as much trouble as AGI is giving us, we’ll still understand it long before we can formalize human morality well enough to simulate that directly. His suggestion of Coherent Extrapolated Volition would basically tell the AI to look to us for the answer. Instead of simulating morality this plan looks to the existing morality-simulators (us) and checks to see how much they agree on. See also this massive spoiler for a certain comic.
Also, the second approach would be pretty much the only way to go if the computer is running the debugger’s life support system, assuming you cannot build a simulation and test potential fixes on it.
I didn’t get the impression that Eliezer’s goal was to “build a provably Friendly AI” (in the mathematical sense of “provable”), as Ben puts it. The impression I get is more that Eliezer wants to put off building an AI until we understand enough about morality and human values. Eliezer also cares about mathematical proofs, but more for the purpose of preserving values under self-modification (something that humans don’t usually have to deal with).
As an analogy, imagine you’re trying to debug some complex and badly written code you were previously unfamiliar with. One approach is to find the bit in the code that seems related to the bug, and modify it locally ( “if DatabaseDown() return False” and the like) until the issue seems fixed. Another approach is to try to understand how the program works to the point where you understand which conceptual mistake caused the bug, and see the right way to fix it.
The second approach takes more time but is also less likely to create another bug somewhere else, or to deteriorate the overall quality of the code. I think most programmers who’ve worked on sufficiently large codebases have seen examples of both approaches.
Anyway, I get the impression that Eliezer is advocating something like the second approach here (understand how everything works before implementing), and that Ben is describing that as “proving correctness”, which seems to be quite different (and much stronger!).
Eliezer_Yudkowsky
Yes, that’s what I was referring to when saying this:
The provability here has to do with the AI proving to itself that modifying itself will preserve it’s values (or not cause it to self-destruct or wirehead or whatever), not the designers proving the AI is non-dangerous.
I.e. friendly as “provably non-dangerous AGI” doesn’t necessarily mean having a rigorous mathematical proof that the AI is not dangerous; but “merely” having enough understanding of morality when building it (as opposed to some high-level notions whose components haven’t been rigorously analyzed).
Seems slightly off to me. I think EY argues that as much trouble as AGI is giving us, we’ll still understand it long before we can formalize human morality well enough to simulate that directly. His suggestion of Coherent Extrapolated Volition would basically tell the AI to look to us for the answer. Instead of simulating morality this plan looks to the existing morality-simulators (us) and checks to see how much they agree on. See also this massive spoiler for a certain comic.
Also, the second approach would be pretty much the only way to go if the computer is running the debugger’s life support system, assuming you cannot build a simulation and test potential fixes on it.