The standard argument in favor of this line of research is that Lob’s theorem implies that purely logical systems will have difficulty reasoning about their future behavior, and that therefore it is worth asking whether a probabilistic language of thought can overcome this obstacle. But I believe this is due to a failure to think sufficiently concretely about the problem. To give two examples: first, humans seem perfectly capable of making predictions about their future behavior despite this issue, and I have yet to see an argument for why AIs should be different. Secondly, we manage to routinely prove facts about the behavior of programs (this is the field of program analysis) despite the fact that in theory this should be “undecidable”.
These examples involve predictions generated by processes which are not purely logical systems, and which we don’t understand enough to code into an AI. So it seems like Paul’s idea could be progress towards having a process that makes such predictions about itself that we can understand at that level.
Can you clarify in what sense you think a computer program is not a purely logical system? Or am I misunderstanding you?
ETA: In particular, I’m referring to the point where I said:
Secondly, we manage to routinely prove facts about the behavior of programs (this is the field of program analysis) despite the fact that in theory this should be “undecidable”.
I didn’t meant that the computer program is not a purely logical system, but that the people proving facts about its behavior aren’t purely logical systems.
Program analysis consists of writing computer programs that reason about other computer programs. Is the objection that these programs were written by a human? That seems like a strange objection to me if so.
Ok, then I did not understand exactly what you meant, but I still don’t think this is a counterexample to the problem Paul’s idea tries to get around.
The problem is that logical systems have problems reasoning about their own behavior, not a claim that there is no other logical system that can reason about them. In particular, we are interested in if an optimization process can model itself as an optimization process, accurately predicting that its future decisions are likely to achieve outcomes that score well on its optimization criteria and the score will be better if it has more resources, and will become much worse if its representation of its criteria gets corrupted, (all using abstract reasoning in much less time than fully simulating its future decisions). Can program analysis do that?
ETA: Also, I should note that this is a good question and I’m glad you asked it!
If your question is whether a program analyzer can, when given itself as input, produce sensible results,
the answer is yes. Program analyzers are meant to run on arbitrary code, so in particular they can be run on themselves as a special instance. (Actually, nothing particularly special happens in this case as far as I can tell.)
Now, a key point is the formalism we are working in: a program analyzer takes in a program P and some specification S, and checks whether P obeys specification S (for instance, checking that it never accesses
memory before it allocates it). Importantly, P is allowed to do any 3 of the following:
report that P satisfies S (with a proof)
report that P does not satisfy S (with a counterexample)
give up (i.e. report “unsure”)
So a key question is how often it gives up! But while there are many instances where modern program analysis tools do give up, there are also many where they don’t. Furthermore if you are an AI and you propose a modification to your code, and your program analysis subroutine reports “unsure”, you are free (and would be wise) to try a different modification instead. Researchers in the field of program analysis are extremely cognizant of the halting problem (which is closely related to Lob’s theorem) and typically deal with it by over-approximating, e.g. by identifying conditions that would be sufficient for halting, although not necessarily necessary. As a result one obtains solubility at the cost of precision (although note that the approximation is always sound: if we report that P satisfies S, then P does indeed always satisfy S).
This is a good approach for dealing with the halting problem, but I think that Lob’s theorem is not so closely related that getting around the halting problem means you get around Lob’s theorem.
The theoretical AI that would run into Lob’s theorem would need more general proof producing capability than these relatively simple program analyzers.
It seems like these program analyzers are built around the specification S they check for, with the human programmer doing the work of constructing a structure of a proof which can be filled in to a complete proof by supplying basic facts that the program can check. For example, I have a library that produces .Net assemblies, with byte code that targets a stack based virtual machine, and I want to verify that instructions to read elements off the stack get elements of the type is expecting. So I wrote my library to keep track of types that could be written to the stack at any execution point (represented by a stack of types). It is straightforward to compute the possible stack states after each instruction, given that instruction and the previous possible stack trace, and determine if the instruction is legal given the previous state (well, branching makes it more complicated, but not too much). So, in this case, I came up with the structure of tracking possible states after each instruction, and then it was straightforward to write my program to fill in that structure, but I did not, and don’t know how to, write my program to come up with the proof structure. It is therefor easy to be confident that proof will have nice properties, because the space of possible proofs with this structure is much smaller than the space of all possible proofs.
The theoretical AI that would run into Lob’s theorem would be able to come up with proof structures, as an AI that could only use proof structures prepackaged by human programmers would have huge gaps in its capabilities. This may introduce difficulties not seen in simple program analyzers.
In the example I asked about earlier, the optimizer that needs to prove things about its own future decisions, it runs into an issue that Lob’s theorem applies to. In order prove that its own future decisions are good, it would rely on the fact that it’s future decision are based on its own sound proof system, so it needs to assert that its own proof system is sound, that if its proof system says “X”, then X. Lob’s theorem says that, for all statements X, if a system P (at least as powerful as Peano arithmetic) says “P says ‘X’ implies X” then P says X. So, if the system asserts its own soundness, it asserts anything.
So, in summary:
Lob’s theorem is a problem for generally powerful formal logic based proof systems that assert their own soundness.
Program analyzers are formal logic based proof systems that do not run into Lob’s theorem, because they are not generally powerful, and do not assert their own soundness.
Humans are generally powerful proof generators than can have confidence in their own soundness, but are not purely based on formal logic, (we might intuitively dismiss a formal logical proof as “spurious”) and so can get around Lob’s theorem, but we don’t understand how humans do this well enough to write a program to do it.
Paul’s idea of a probabilistic proof system could lead to a generally powerful proof generator with confidence in its own soundness that is not based on formal logic, so that it gets around Lob’s theorem, that we can understand well enough to write a program for.
It seems like these program analyzers are built around the specification S they check for, with the human programmer doing the work of constructing a structure of a proof which can be filled in to a complete proof by supplying basic facts that the program can check. For example, I have a library that produces .Net assemblies, with byte code that targets a stack based virtual machine, and I want to verify that instructions to read elements off the stack get elements of the type is expecting. So I wrote my library to keep track of types that could be written to the stack at any execution point (represented by a stack of types). It is straightforward to compute the possible stack states after each instruction, given that instruction and the previous possible stack trace, and determine if the instruction is legal given the previous state (well, branching makes it more complicated, but not too much). So, in this case, I came up with the structure of tracking possible states after each instruction, and then it was straightforward to write my program to fill in that structure, but I did not, and don’t know how to, write my program to come up with the proof structure. It is therefor easy to be confident that proof will have nice properties, because the space of possible proofs with this structure is much smaller than the space of all possible proofs.
I’m not sure I’ve understood what you have in mind here, but in the general case complete type checking in .NET (that is, proving that an assembly not only is syntactically well-formed but also never throws type-related exceptions at runtime) is undecidable because of Rice’s theorem.
In the general case, complete type checking is as difficult as proving arbitrary claims in first-order logic.
I am not trying to write a classifier that tells you whether or not an arbitrary program throws a type exception. I wrote a verifier that tells you whether or not an arbitrary program can be proven not to throw type exceptions (except possibly at an explicit cast statement, or a throw exception statement) with a particular proof strategy that covers a huge space of useful, nicely structured programs.
See also jsteinhardt’s comment I was responding to, which discussed getting around the halting problem by allowing the checker to say “I don’t know”.
am not trying to write a classifier that tells you whether or not an arbitrary program throws a type exception. I wrote a verifier that tells you whether or not an arbitrary program can be proven not to throw type exceptions (except possibly at an explicit cast statement, or a throw exception statement)
I’m not an expert on .NET, but is there anything that can throw a type exception other than an explicit cast or an explicit throw (or the standard library, I suppose)?
There are sequences of .Net instructions that result in the runtime throwing type exceptions, because it tries to read a value of a certain type of the stack, and it gets an incompatible value. This is the situation that my verifier guards against.
The standard .Net runtime also includes a verifier that checks the same thing, and it will not run code that fails this validation unless it is explicitly trusted. So a verifiable .Net assembly will not throw type exceptions without an explicit cast or throw, but an arbitrary assembly may do so. The compilers for the standard languages such as C# will generally only produce verifiable assemblies unless you explicitly mark parts of the source code as “unsafe”, and unsafe, or unverifiable assemblies need special permissions to run at all.
(There are other properties that both my verifier and the standard verifier check for. The reason I wrote my own is that it produces much more informative descriptions of problems it finds, and it is integrated into my assembly emitting libraries, so it detects problems as the assembly to be emitted is defined, and when run in the debugger, will easily show the compiler code and execution state that caused the problem.)
So a verifiable .Net assembly will not throw type exceptions without an explicit cast or throw, but an arbitrary assembly may do so.
IIUC, unverifiable code does not, or at least is not guaranteed to, politely throw an exception should a type error occur. It may crash the runtime or fail silently leaving the application in an incorrect state.
(There are other properties that both my verifier and the standard verifier check for. The reason I wrote my own is that it produces much more informative descriptions of problems it finds, and it is integrated into my assembly emitting libraries, so it detects problems as the assembly to be emitted is defined, and when run in the debugger, will easily show the compiler code and execution state that caused the problem.)
Ok. I thought that you were considering assemblies that passed the standard .NET verification and you were trying to check for some stronger property (such as absence of runtime exceptions caused by downcasts). That would have been equivalent to arbitrary first-order logic inference. Since you are instead checking for decidable properties, your system is indeed not equivalent to arbitrary first-order logic inference.
But as jsteinhardt says, it is actually possible to write verifiers that attempt to check for undecidable properties, provided that they have the option to give up.
My mathematical logic is a bit rusty, but my impression is that the following are true:
Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem. A reflectively consistent theory may still be incomplete, but any complete theory is necessarily reflectively consistent.
The undecidability of the halting problem is basically Godel’s theorem stated in computational terms. If we could identify a subset L of Turing machines for whom the halting problem can be decided, as long as it was closed under operations such as inserting a (non-self-referential) sub-routine, then we would be able to verify any (non-self-referential) property of the program that was also expressible in L. This is a sketch of a claim rather than an actual claim that I’ve proved, though.
Finally, I think it’s worth pointing out an actual example of a program analysis tool since I think they are more powerful than you have in mind. The following slides are a good example of such a tool. At a high level, it gets around the problems you are worried about by constructing an over-approximation of the halting problem that is expressible in propositional logic (and thus decidable, in fact it is even in NP). More generally we can construct a sequence of approximations, each expressible in propositional logic, whose conjunction is no longer an approximation but in fact exactly the original statement.
Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem.
Why do you say that? My understanding is that Godel’s theorem says that a (sufficiently powerful) logical system has true statements it can’t prove, but these statements are excessively complicated and probably not important. Is there some way you envision an AGI being limited in its capacity to achieve its goals by Godel’s theorem, as we envision Lob’s theorem blocking an AGI from trusting its future self to make effective decisions? (Besides where the goals are tailored to be blocked by the theorem: “Prove all true statements in my formal system”)
Finally, I think it’s worth pointing out an actual example of a program analysis tool since I think they are more powerful than you have in mind. The following slides are a good example of such a tool.
As near as I can tell, this is more powerful than other static analysis tools that I have seen (though maybe not, the Google Web Toolkit optimizer is pretty impressive), but it is well within what I expect to be possible, and doesn’t get around Lob’s theorem. (I can’t be too confident in this assessment of its power because I don’t see a clear claim of what sort of statements it can prove or how it works except that it seems to detect when inputs to a statement may have invalid values and that it uses brute force techniques to analyze functions and then associates a summary of the analysis with that function (constraints on valid inputs and guarantees on outputs?) so that its analysis of call sites can use the summary.) (This sort of program analyzer is interesting in its own right, and I would like to see a more complete explanation of it, but I don’t think it’s existence says anything about the problems posed by Lob’s theorem.)
Do you agree or disagree that complete implies reflectively consistent? If you agree, then do you agree or disagree that this means avoidance of Godelian obstacles implies avoidance of Lobian obstacles? If you agree with both of those statements, I’m confused as to why:
Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem.
Ah, so by “Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem” you mean if you overcome Godelian obstacles you also overcome Lobian obstacles? I think I agree, but I am not sure that it is relevant, because the program analyzer examples don’t overcome Godelian obstacles, they just cope with the Godelian obstacles, which does not similarly imply coping with or overcoming Lobian obstacles.
I’m guessing that he is using the standard AI terminology of a “logical program” as one which reasons by means of logical inference over a knowledgebase. This is not how human minds work, nor do many AI researchers view it as a viable path forward.
The types of programs which have seen success in the AI revolution that is currently going on, often are not amenable to program analysis. They very often fall in the “undecidable” category, at least with respect to their anticipated behavior.
The types of programs which have seen success in the AI revolution that is currently going on, often are not amenable to program analysis. They very often fall in the “undecidable” category, at least with respect to their anticipated behavior.
This depends which properties you care about. I suspect there isn’t a small model of the AI program that duplicates its output behavior, but you can do a lot with off-the-shelf analysis.
It’s relatively easy to prove things like “this program will only ever use the following subset of available system calls,” or “this program is well typed”, or “the program cannot modify itself”, or “can only modify these fixed specific parts of its state.”
I can also imagine a useful AI program where you can prove a bound on the run-time, of the form “this program will always terminate in at most X steps”, for some actual fixed constant X. (Obviously you cannot do this for programs in general, but if the program only loops over its inputs for a fixed number of iterations or somesuch, you can do it.)
These sorts of properties are far from a general proof “this program is Safe”, but they are non-trivial and useful properties to verify.
The FAI-related things you would want to prove are of the form: “when given a world state characterized by X, input percepts characterized by Y, the program always generates outputs characterized by Z.” For many existing and popular AI architectures, we haven’t the foggiest idea how to do this. (It’s no surprise that Eliezer Yudkowsky favors Pearl’s causal probabilistic graph models, where such analysis is at least known to be possible.)
To the extent that X, Y, and Z can be written formally within the programming language, program analysis at least in principle is fully capable of proving such a statement. I apologize if this comes off as rude, but your statement that “we haven’t the foggiest idea how to do this” is flat-out false. While there are certainly unique challenges to reasoning about the sort of code that gets written in machine learning, it seems to me that the main reason we don’t have well-developed analysis tools is that most code doesn’t look like machine learning code, and so there has been little pressure to develop such tools.
Program analysis consists of writing computer programs that reason about other computer programs. Is the objection that these programs were written by a human? That seems like a strange objection to me if so.
Can you give an example? The main reason they are not amenable to program analysis is because the sorts of guarantees they are supposed to satisfy are probabilistic / statistical in nature, and we don’t yet have good techniques for verifying such properties. I am pretty sure that the issue is not undecidability.
I should also clarify that whether or not JGWeissman thought that I meant “logical program” instead of “computer program”, my comment was intended to mean “computer program” and the field of program analysis studies “computer programs”, not “logical programs”.
Ok, throw it back at you: how do you prove the behavior of a deep belief network, or any other type of neural network currently in vogue, short of actually running it? If you do have some way of doing it, can I be coauthor? I didn’t mean to imply that it was proved impossible, just that no one has the faintest idea how to do it—and not for lack of trying.
What does “prove the behavior” mean? If you mean “prove that it will classify this next image correctly” then of course the answer is no, although that is because it’s not even necessarily true. If you mean “prove that the program’s behavior always falls within some well-defined regime”, then yes we can absolutely prove that sort of statement. Things that are at the boundary of what we can prove (i.e. I don’t know of any existing formal tools that do it, but I’m pretty sure we could make one) would be something like “the weights of the network don’t change too much if we change the input a small amount” or “the prediction error on the training set will decrease monotonically” or “the generalization error is bounded by assuming that the test data ends up being drawn from the same distribution as the training data”.
Presumably for friendliness you want something like “P and P’ satisfy some contract relative to each other” (such as sufficiently similar behavior) which is the sort of thing we are already relatively good at proving. Here I’m imagining that P is the “old” version of a program and P’ is a “modified” version of the program that we are considering using.
These examples involve predictions generated by processes which are not purely logical systems, and which we don’t understand enough to code into an AI. So it seems like Paul’s idea could be progress towards having a process that makes such predictions about itself that we can understand at that level.
Can you clarify in what sense you think a computer program is not a purely logical system? Or am I misunderstanding you?
ETA: In particular, I’m referring to the point where I said:
I didn’t meant that the computer program is not a purely logical system, but that the people proving facts about its behavior aren’t purely logical systems.
Program analysis consists of writing computer programs that reason about other computer programs. Is the objection that these programs were written by a human? That seems like a strange objection to me if so.
Ok, then I did not understand exactly what you meant, but I still don’t think this is a counterexample to the problem Paul’s idea tries to get around.
The problem is that logical systems have problems reasoning about their own behavior, not a claim that there is no other logical system that can reason about them. In particular, we are interested in if an optimization process can model itself as an optimization process, accurately predicting that its future decisions are likely to achieve outcomes that score well on its optimization criteria and the score will be better if it has more resources, and will become much worse if its representation of its criteria gets corrupted, (all using abstract reasoning in much less time than fully simulating its future decisions). Can program analysis do that?
ETA: Also, I should note that this is a good question and I’m glad you asked it!
If your question is whether a program analyzer can, when given itself as input, produce sensible results, the answer is yes. Program analyzers are meant to run on arbitrary code, so in particular they can be run on themselves as a special instance. (Actually, nothing particularly special happens in this case as far as I can tell.)
Now, a key point is the formalism we are working in: a program analyzer takes in a program P and some specification S, and checks whether P obeys specification S (for instance, checking that it never accesses memory before it allocates it). Importantly, P is allowed to do any 3 of the following:
report that P satisfies S (with a proof)
report that P does not satisfy S (with a counterexample)
give up (i.e. report “unsure”)
So a key question is how often it gives up! But while there are many instances where modern program analysis tools do give up, there are also many where they don’t. Furthermore if you are an AI and you propose a modification to your code, and your program analysis subroutine reports “unsure”, you are free (and would be wise) to try a different modification instead. Researchers in the field of program analysis are extremely cognizant of the halting problem (which is closely related to Lob’s theorem) and typically deal with it by over-approximating, e.g. by identifying conditions that would be sufficient for halting, although not necessarily necessary. As a result one obtains solubility at the cost of precision (although note that the approximation is always sound: if we report that P satisfies S, then P does indeed always satisfy S).
This is a good approach for dealing with the halting problem, but I think that Lob’s theorem is not so closely related that getting around the halting problem means you get around Lob’s theorem.
The theoretical AI that would run into Lob’s theorem would need more general proof producing capability than these relatively simple program analyzers.
It seems like these program analyzers are built around the specification S they check for, with the human programmer doing the work of constructing a structure of a proof which can be filled in to a complete proof by supplying basic facts that the program can check. For example, I have a library that produces .Net assemblies, with byte code that targets a stack based virtual machine, and I want to verify that instructions to read elements off the stack get elements of the type is expecting. So I wrote my library to keep track of types that could be written to the stack at any execution point (represented by a stack of types). It is straightforward to compute the possible stack states after each instruction, given that instruction and the previous possible stack trace, and determine if the instruction is legal given the previous state (well, branching makes it more complicated, but not too much). So, in this case, I came up with the structure of tracking possible states after each instruction, and then it was straightforward to write my program to fill in that structure, but I did not, and don’t know how to, write my program to come up with the proof structure. It is therefor easy to be confident that proof will have nice properties, because the space of possible proofs with this structure is much smaller than the space of all possible proofs.
The theoretical AI that would run into Lob’s theorem would be able to come up with proof structures, as an AI that could only use proof structures prepackaged by human programmers would have huge gaps in its capabilities. This may introduce difficulties not seen in simple program analyzers.
In the example I asked about earlier, the optimizer that needs to prove things about its own future decisions, it runs into an issue that Lob’s theorem applies to. In order prove that its own future decisions are good, it would rely on the fact that it’s future decision are based on its own sound proof system, so it needs to assert that its own proof system is sound, that if its proof system says “X”, then X. Lob’s theorem says that, for all statements X, if a system P (at least as powerful as Peano arithmetic) says “P says ‘X’ implies X” then P says X. So, if the system asserts its own soundness, it asserts anything.
So, in summary:
Lob’s theorem is a problem for generally powerful formal logic based proof systems that assert their own soundness.
Program analyzers are formal logic based proof systems that do not run into Lob’s theorem, because they are not generally powerful, and do not assert their own soundness.
Humans are generally powerful proof generators than can have confidence in their own soundness, but are not purely based on formal logic, (we might intuitively dismiss a formal logical proof as “spurious”) and so can get around Lob’s theorem, but we don’t understand how humans do this well enough to write a program to do it.
Paul’s idea of a probabilistic proof system could lead to a generally powerful proof generator with confidence in its own soundness that is not based on formal logic, so that it gets around Lob’s theorem, that we can understand well enough to write a program for.
I’m not sure I’ve understood what you have in mind here, but in the general case complete type checking in .NET (that is, proving that an assembly not only is syntactically well-formed but also never throws type-related exceptions at runtime) is undecidable because of Rice’s theorem.
In the general case, complete type checking is as difficult as proving arbitrary claims in first-order logic.
I am not trying to write a classifier that tells you whether or not an arbitrary program throws a type exception. I wrote a verifier that tells you whether or not an arbitrary program can be proven not to throw type exceptions (except possibly at an explicit cast statement, or a throw exception statement) with a particular proof strategy that covers a huge space of useful, nicely structured programs.
See also jsteinhardt’s comment I was responding to, which discussed getting around the halting problem by allowing the checker to say “I don’t know”.
I’m not an expert on .NET, but is there anything that can throw a type exception other than an explicit cast or an explicit throw (or the standard library, I suppose)?
There are sequences of .Net instructions that result in the runtime throwing type exceptions, because it tries to read a value of a certain type of the stack, and it gets an incompatible value. This is the situation that my verifier guards against.
The standard .Net runtime also includes a verifier that checks the same thing, and it will not run code that fails this validation unless it is explicitly trusted. So a verifiable .Net assembly will not throw type exceptions without an explicit cast or throw, but an arbitrary assembly may do so. The compilers for the standard languages such as C# will generally only produce verifiable assemblies unless you explicitly mark parts of the source code as “unsafe”, and unsafe, or unverifiable assemblies need special permissions to run at all.
(There are other properties that both my verifier and the standard verifier check for. The reason I wrote my own is that it produces much more informative descriptions of problems it finds, and it is integrated into my assembly emitting libraries, so it detects problems as the assembly to be emitted is defined, and when run in the debugger, will easily show the compiler code and execution state that caused the problem.)
Thanks for the clarification.
IIUC, unverifiable code does not, or at least is not guaranteed to, politely throw an exception should a type error occur. It may crash the runtime or fail silently leaving the application in an incorrect state.
Ok. I thought that you were considering assemblies that passed the standard .NET verification and you were trying to check for some stronger property (such as absence of runtime exceptions caused by downcasts). That would have been equivalent to arbitrary first-order logic inference.
Since you are instead checking for decidable properties, your system is indeed not equivalent to arbitrary first-order logic inference.
But as jsteinhardt says, it is actually possible to write verifiers that attempt to check for undecidable properties, provided that they have the option to give up.
My mathematical logic is a bit rusty, but my impression is that the following are true:
Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem. A reflectively consistent theory may still be incomplete, but any complete theory is necessarily reflectively consistent.
The undecidability of the halting problem is basically Godel’s theorem stated in computational terms. If we could identify a subset L of Turing machines for whom the halting problem can be decided, as long as it was closed under operations such as inserting a (non-self-referential) sub-routine, then we would be able to verify any (non-self-referential) property of the program that was also expressible in L. This is a sketch of a claim rather than an actual claim that I’ve proved, though.
Finally, I think it’s worth pointing out an actual example of a program analysis tool since I think they are more powerful than you have in mind. The following slides are a good example of such a tool. At a high level, it gets around the problems you are worried about by constructing an over-approximation of the halting problem that is expressible in propositional logic (and thus decidable, in fact it is even in NP). More generally we can construct a sequence of approximations, each expressible in propositional logic, whose conjunction is no longer an approximation but in fact exactly the original statement.
Why do you say that? My understanding is that Godel’s theorem says that a (sufficiently powerful) logical system has true statements it can’t prove, but these statements are excessively complicated and probably not important. Is there some way you envision an AGI being limited in its capacity to achieve its goals by Godel’s theorem, as we envision Lob’s theorem blocking an AGI from trusting its future self to make effective decisions? (Besides where the goals are tailored to be blocked by the theorem: “Prove all true statements in my formal system”)
As near as I can tell, this is more powerful than other static analysis tools that I have seen (though maybe not, the Google Web Toolkit optimizer is pretty impressive), but it is well within what I expect to be possible, and doesn’t get around Lob’s theorem. (I can’t be too confident in this assessment of its power because I don’t see a clear claim of what sort of statements it can prove or how it works except that it seems to detect when inputs to a statement may have invalid values and that it uses brute force techniques to analyze functions and then associates a summary of the analysis with that function (constraints on valid inputs and guarantees on outputs?) so that its analysis of call sites can use the summary.) (This sort of program analyzer is interesting in its own right, and I would like to see a more complete explanation of it, but I don’t think it’s existence says anything about the problems posed by Lob’s theorem.)
Do you agree or disagree that complete implies reflectively consistent? If you agree, then do you agree or disagree that this means avoidance of Godelian obstacles implies avoidance of Lobian obstacles? If you agree with both of those statements, I’m confused as to why:
is a controversial statement.
Ah, so by “Godel’s theorem presents a strictly stronger obstacle than Lob’s theorem” you mean if you overcome Godelian obstacles you also overcome Lobian obstacles? I think I agree, but I am not sure that it is relevant, because the program analyzer examples don’t overcome Godelian obstacles, they just cope with the Godelian obstacles, which does not similarly imply coping with or overcoming Lobian obstacles.
I’m guessing that he is using the standard AI terminology of a “logical program” as one which reasons by means of logical inference over a knowledgebase. This is not how human minds work, nor do many AI researchers view it as a viable path forward.
The types of programs which have seen success in the AI revolution that is currently going on, often are not amenable to program analysis. They very often fall in the “undecidable” category, at least with respect to their anticipated behavior.
This depends which properties you care about. I suspect there isn’t a small model of the AI program that duplicates its output behavior, but you can do a lot with off-the-shelf analysis.
It’s relatively easy to prove things like “this program will only ever use the following subset of available system calls,” or “this program is well typed”, or “the program cannot modify itself”, or “can only modify these fixed specific parts of its state.”
I can also imagine a useful AI program where you can prove a bound on the run-time, of the form “this program will always terminate in at most X steps”, for some actual fixed constant X. (Obviously you cannot do this for programs in general, but if the program only loops over its inputs for a fixed number of iterations or somesuch, you can do it.)
These sorts of properties are far from a general proof “this program is Safe”, but they are non-trivial and useful properties to verify.
The FAI-related things you would want to prove are of the form: “when given a world state characterized by X, input percepts characterized by Y, the program always generates outputs characterized by Z.” For many existing and popular AI architectures, we haven’t the foggiest idea how to do this. (It’s no surprise that Eliezer Yudkowsky favors Pearl’s causal probabilistic graph models, where such analysis is at least known to be possible.)
To the extent that X, Y, and Z can be written formally within the programming language, program analysis at least in principle is fully capable of proving such a statement. I apologize if this comes off as rude, but your statement that “we haven’t the foggiest idea how to do this” is flat-out false. While there are certainly unique challenges to reasoning about the sort of code that gets written in machine learning, it seems to me that the main reason we don’t have well-developed analysis tools is that most code doesn’t look like machine learning code, and so there has been little pressure to develop such tools.
Program analysis consists of writing computer programs that reason about other computer programs. Is the objection that these programs were written by a human? That seems like a strange objection to me if so.
Edit: oops, replied to wrong comment.
Can you give an example? The main reason they are not amenable to program analysis is because the sorts of guarantees they are supposed to satisfy are probabilistic / statistical in nature, and we don’t yet have good techniques for verifying such properties. I am pretty sure that the issue is not undecidability.
I should also clarify that whether or not JGWeissman thought that I meant “logical program” instead of “computer program”, my comment was intended to mean “computer program” and the field of program analysis studies “computer programs”, not “logical programs”.
Ok, throw it back at you: how do you prove the behavior of a deep belief network, or any other type of neural network currently in vogue, short of actually running it? If you do have some way of doing it, can I be coauthor? I didn’t mean to imply that it was proved impossible, just that no one has the faintest idea how to do it—and not for lack of trying.
What does “prove the behavior” mean? If you mean “prove that it will classify this next image correctly” then of course the answer is no, although that is because it’s not even necessarily true. If you mean “prove that the program’s behavior always falls within some well-defined regime”, then yes we can absolutely prove that sort of statement. Things that are at the boundary of what we can prove (i.e. I don’t know of any existing formal tools that do it, but I’m pretty sure we could make one) would be something like “the weights of the network don’t change too much if we change the input a small amount” or “the prediction error on the training set will decrease monotonically” or “the generalization error is bounded by assuming that the test data ends up being drawn from the same distribution as the training data”.
Presumably for friendliness you want something like “P and P’ satisfy some contract relative to each other” (such as sufficiently similar behavior) which is the sort of thing we are already relatively good at proving. Here I’m imagining that P is the “old” version of a program and P’ is a “modified” version of the program that we are considering using.