# AI cooperation in practice

You know that automated proof verifiers exist, right? And also that programs can know their own source code? Well, here’s a puzzle for you:

*Consider a program A that knows its own source code. The algorithm of A is as follows: generate and check all possible proofs up to some huge size (3^^^^3). If A finds a proof that A returns 1, it returns 1. If the search runs to the end and fails, it returns 0. What will this program actually return?*

Wait, that was the easy version. Here’s a harder puzzle:

*Consider programs A and B that both know their own, and each other’s, source code. The algorithm of A is as follows: generate and check all proofs up to size 3^^^^3. If A finds a proof that A returns the same value as B, it returns 1. If the search fails, it returns 0. The algorithm of B is similar, but possibly with a different proof system and different length limit. What will A and B return?*

This second puzzle is a formalization of a Prisoner’s Dilemma strategy proposed by Eliezer: “I cooperate if and only I expect you to cooperate if and only if I cooperate”. So far we only knew how to make this strategy work by “reasoning from symmetry”, also known as quining. But programs A and B can be very different—a human-created AI versus an extraterrestrial crystalloid AI. Will they cooperate?

I may have a tentative proof that the answer to the first problem is 1, and that in the second problem they will cooperate. But: a) it requires you to understand some logic (the diagonal lemma and Löb’s Theorem), b) I’m not sure it’s correct because I’ve only studied the subject for the last four days, c) this margin is too small to contain it. So I put it up here. I submit this post with the hope that, even though the proof is probably wrong or incomplete, the ideas may still be useful to the community, and someone else will discover the correct answers.

**Edit:** by request from Vladimir Nesov, I reposted the proofs to our wiki under my user page. Many thanks to all those who took the time to parse and check them.

- Robust Cooperation in the Prisoner’s Dilemma by 7 Jun 2013 8:30 UTC; 73 points) (
- Recommended Reading for Friendly AI Research by 9 Oct 2010 13:46 UTC; 29 points) (
- Controlling Constant Programs by 5 Sep 2010 13:45 UTC; 25 points) (
- Decision Theories: A Semi-Formal Analysis, Part III by 14 Apr 2012 19:34 UTC; 23 points) (
- Formulas of arithmetic that behave like decision agents by 3 Feb 2012 2:58 UTC; 22 points) (
- Predictability of Decisions and the Diagonal Method by 9 Mar 2012 23:53 UTC; 21 points) (
- 29 Nov 2017 13:05 UTC; 18 points) 's comment on The Mad Scientist Decision Problem by (
- 6 Sep 2010 2:19 UTC; 14 points) 's comment on Something’s Wrong by (
- Loebian cooperation, version 2 by 31 May 2012 18:41 UTC; 13 points) (
- Imagine a world where minds run on physics by 31 Oct 2010 19:09 UTC; 12 points) (
- Other prespective on resolving the Prisoner’s dilemma by 4 Jun 2013 16:13 UTC; 11 points) (
- 28 Mar 2012 21:21 UTC; 7 points) 's comment on Common mistakes people make when thinking about decision theory by (
- 15 Apr 2012 18:39 UTC; 6 points) 's comment on Decision Theories: A Semi-Formal Analysis, Part III by (
- 24 Mar 2012 23:45 UTC; 4 points) 's comment on Decision Theories: A Semi-Formal Analysis, Part I by (
- 16 Nov 2011 21:54 UTC; 4 points) 's comment on Babyeater’s dilemma by (
- 1 Jan 2011 23:30 UTC; 4 points) 's comment on Tallinn-Evans $125,000 Singularity Challenge by (
- 11 Aug 2010 23:48 UTC; 4 points) 's comment on A Proof of Occam’s Razor by (
- 11 Jan 2012 18:20 UTC; 3 points) 's comment on A model of UDT with a halting oracle by (
- 23 Aug 2010 14:57 UTC; 3 points) 's comment on The Smoking Lesion: A problem for evidential decision theory by (
- 25 Mar 2012 0:04 UTC; 2 points) 's comment on Decision Theories: A Semi-Formal Analysis, Part I by (
- 8 Jul 2012 14:13 UTC; 2 points) 's comment on Stupid Questions Open Thread Round 3 by (
- 27 Aug 2011 20:12 UTC; 2 points) 's comment on Decision Theory Paradox: PD with Three Implies Chaos? by (
- 13 Aug 2010 6:07 UTC; 2 points) 's comment on What a reduction of “could” could look like by (
- 13 Aug 2010 15:21 UTC; 2 points) 's comment on What a reduction of “could” could look like by (
- 27 Jun 2012 23:44 UTC; 2 points) 's comment on Bounded versions of Gödel’s and Löb’s theorems by (
- 11 Aug 2010 7:32 UTC; 1 point) 's comment on Two straw men fighting by (
- 22 Sep 2010 4:55 UTC; 1 point) 's comment on Open Thread, September, 2010-- part 2 by (
- 22 Feb 2012 8:40 UTC; 1 point) 's comment on Formulas of arithmetic that behave like decision agents by (
- 26 Mar 2012 1:38 UTC; 0 points) 's comment on An example of self-fulfilling spurious proofs in UDT by (
- 19 Dec 2011 1:17 UTC; 0 points) 's comment on A model of UDT with a halting oracle by (
- 3 Apr 2011 18:53 UTC; 0 points) 's comment on Where does uncertainty come from? by (
- 10 Aug 2010 11:00 UTC; 0 points) 's comment on Two straw men fighting by (
- 9 Nov 2010 21:47 UTC; 0 points) 's comment on Rationality Quotes: November 2010 by (
- 22 Feb 2012 8:30 UTC; 0 points) 's comment on Formulas of arithmetic that behave like decision agents by (

I notice a curious pattern in this post: technically illiterate critical comments get voted up to +1/+2 (even after getting debunked), while technically accurate arguments (replies) stay at 0/+1. Please vote up only things you know you understand, and do vote up comments that debunk incorrect arguments.

ETA: Mostly corrected now.I wanted to say that yesterday, but chose instead to stay silent on the meta-issue, try as best I can to clarify the technical issues (though having to explain that there

areactual technical issues was a nasty surprise for me), and refrain from downvoting illiterate critiques because I could be biased.So thanks for saying what I wanted to say—it sounds better when someone else says it.

Cousin_it, do you have an argument that n1 … n6 < nmax? It seems like one way for the proof to fail is if no matter what nmax is, step 11 forces n1 > nmax.

Only a very rough one, I’m afraid. The number n1 is chosen to “contain” the blow-up from the short proof of 10 to the proof of 11, which grows with the computational complexities of n1,...,nmax, not their values. These computational complexities can be made way smaller than n1.

Ok, I was checking if there was an obvious flaw that you might have missed. Now I’m trying to actually understand your proof, and I’m getting stuck on the first step, where you invoke the Diagonal Lemma. Now a typical statement of the Diagonal Lemma is:

Diagonal Lemma: If T is a theory in which diag is representable, then for any formula B(x) with exactly one free variable x there is a formula G such that G ⇔ B(g(G)) is a theorem in T. (g(G) is the Gödel number of G. This was copied from page 2 of http://www.cs.cornell.edu/courses/cs4860/2009sp/lec-23.pdf)

If you look at page 3 of that PDF, it gives an explicit formula for G, which happens to contain an existential quantifier. Now my question is, how do I translate that G into a code string (the one named “box” in your proof)? Also, is this supposed to be obvious, or are you assuming some background knowledge beyond the standard expositions of the Diagonal Lemma and Löb’s Theorem?

You can build “box” by quining, I think. Like this:

Where s contains the source code of the entire snippet. I’d kinda assumed that quining and the diagonal lemma were the same thing, I guess that was wrong.

Quining is implied by the diagonal lemma, and but you can use the diagonal lemma like you’ve written. See this webpage on quines to understand the distinction. Wei Dai, it’s easy to convert between code strings and Gödel numbers, at least in theory. I don’t know about your particular example’s numbering, because that link isn’t working for me. You may want to find something specific to complexity theory, instead of math, which I suspect you (personally) would find more understandable. The quine page has a brief intro, which includes a proof of the fixed-point theorem.

In this case I think we have B(x) = implies(proves(x, n1), eval(outputs(1))) which is fine, but doesn’t bound the size of BOX. However, cousin_it specifies a way to keep BOX small using quining, so this proof seems legit to me. Its conclusion is just as unintuitive as Lob’s Theorem, but the steps all look all right.

Edit: For an explicit way to compute fixed points, see the Y combinator.

You don’t need that to be a program for the proof to go through.

Why is this true? I am trying to understand. An obvious way to prove that eval(box) returns true is to go through the execution step by step. But when I write down what box is, it seems to call proves(box, n1) and eval(outputs(1)), which take much more than n1 steps.

n1 is chosen as to contain the

sizeof the “box” statement, not its execution time.Actually, someone shot down my proposed my.C = (my.C == your.C) algorithm, because if two prisoners both defect, that is in a mathematical sense “cooperating if and only if you cooperate”. So if the other prisoner is Defection Rock, then my.C = (my.C == your.C) has no consistent solution.

My algorithm defects against the Defection Rock just fine :-)

Does your proof of that take more than 3^^^^3 steps? If not, then doesn’t your algorithm see that it defects against Defection Rock, and therefore it cooperates, which is inconsistent? If it does, when did you do that proof?

In other words, it seems to me that played against Defection Rock, your algorithm freezes—it doesn’t output either 1 or 0.

Here’s my proof that A defects against Defection Rock: by assumption, A’s proof checker is correct. If A cooperates against the Defection Rock, A must have arrived at a proof that A’s choice is equal to B’s choice. But A’s choice is not equal to Rock’s choice. Therefore A’s proof checker is incorrect, contradiction, QED.

This proof doesn’t take more than 3^^^^3 steps, but it doesn’t matter. A can’t use it anyway because it can’t assume (or prove) that its proof checker is correct. Goedel’s second theorem: if a formal system asserts or proves its own consistency, it is inconsistent. That’s why all proof systems that people actually use

cannotprove their own consistency.This comment, and another one (“No, A doesn’t know that because it doesn’t know B’s proof checker is correct.”) are buried deep down in the comment thread, but I think they deserve their own mentions in the main post, because they are extremely helpful for the inexperienced reader to understand what the real issues are here.

Nice separation of levels in this comment. It’s essential to point out the difference between what can be done by the automated proof-checker and what can be shown ‘from the outside’ about the operation of the proof-checker.

I second DanielVarga: you should incorporate this discussion into the post itself, since it’s

exactlythe part that people get stuck on.That proves that A can’t cooperate, but it doesn’t prove that it defects, since there remains the possibility that the program doesn’t output anything.

The program is guaranteed to finish because there’s a finite number of proofs to check, and checking every proof takes finite time (a proof checker cannot loop forever on a given proof).

So with an argument like the one that A defects, I can prove that your mind is inconsistent:

“Cousin_it will never prove this statement to be true.”

I know that this is true, and presumably you know it too… which is a contradiction.

I don’t know that it’s true.

And I don’t see how this makes the proof wrong.

It doesn’t make the proof wrong.

Oh, sorry then :-)

“Lucas cannot consistently assert this sentence” is an old philosophers’ joke, actually.

Yes, I can.

The informal principle still can find a formalization, and this post gives witness to that.

In the first version, there are different ways to go about programming algorithm A, and some will output 1, and some will output 0. In other words, “if A finds a proof that A returns 1” is ambiguous, because it depends on the exact value of A, not the generic description given.

In the second version, both programs output 1, since this is the only possible consistent result. If either program might output a 0, then both programs would have to, which implies that both should have output 1.

In the first version, are you sure? I gave an argument that A will output 1, have you looked at it?

In the second version, you confuse truth and provability. If both programs output 0, that doesn’t imply they can

provethat their output will be the same.I’ve looked at the argument, and it seems to me that Tim Tyler is right, the first case depends on the implementation of “proves (code, maxsteps).”

Regarding the second case, suppose A runs through checking for proofs, and doesn’t find any. It then is about to output a 0. But in this case, it is impossible that B should output a 1, because then B would have a proof that A and B output the same thing, and this would be false. Therefore B also must output a 0. Therefore A and B output the same thing. So we should add a line of code to A to express this: “If A is about to output 0, then output 1, because it is proved (by the above reasoning) that A and B output the same thing.” The same code should be included in B, and so both will output 1.

Your proposed change will make A always output 1, and thus become a rather poor player in the Prisoner’s Dilemma. You don’t want to cooperate with everybody.

However, B will as well, and you do want to cooperate with everybody who cooperates with you, so my proposed change works very well.

Not if they will cooperate regardless of what you do. Then you can win more by defecting.

A knows B’s source code, and so it knows that B will only output a 1 if A and B output the same thing, i.e. only if A and B both output 1.

No, A doesn’t know that because it doesn’t know B’s proof checker is correct.

You really have to be triple extra careful when talking about this stuff. I’m trying to.

Yes, I see that.

Nope, this is another case of confusing truth with provability. A doesn’t know B’s proof checker is correct, so I don’t think it can make that inference.

Actually, no. You confuse truth and provability for the second (third?) time. A doesn’t have a proof that B’s proof checker is correct, so it cannot deduce what you just deduced. Please please please, just try to parse

anytechnical material associated with the topic. People have learned to avoid this particular pitfall decades ago. We have tools for that. I am using them heavily here.I replied here.

I changed my mind anyway.

Nothing depends on the details of

proof verifier, since it completely covers all proofs up to some length, a set of proofs which is completely defined by the logical language. The arguments of proves(-,-) determine its value (and it’s even a primitive recursive function).You don’t specify the implementation of “proves(code, maxsteps)”.

The output would appear to depend on that.

Of course, there’s the whole business of universal heat death once again...

Nothing depends on the details of

proof verifier, since it completely covers all proofs up to some length, a set of proofs which is completely defined by the logical language. The arguments of proves(-,-) determine its value (and it’s even a primitive recursive function).On the other hand, pathological formal language can be constructed, that makes proof of “main()==1” arbitrary large (e.g. where 1 is represented by 3^^^3 symbols). Thus it is better to be proven that formal language isn’t pathological, i.e there is no prover

proves’that satisfies conditionEnumeration order of proofs is not specified, so we can construct pathological prover, that reorders proofs such that all proofs of “main()==1” require more steps than maxsteps.

You can’t influence the length of a proof by placing it at a different position in some list of proofs. Parameter ‘maxsteps’ specifies the maximum length (or Gödel number) of the proofs that get checked, not the maximum number of steps performed by the proof checker algorithm.

Ouch. maxsteps means maxprooflenght. My inattention, sorry.

...but the the “logical language” is not specified! That is one of the implementation details that is missing in the problem specification.

You know, there are always unstated assumptions, like sanity of the reader.

Usually, when posing puzzles, it is best to try to minimise the number of unstated assumptions the solution depends upon—or there won’t be a unique answer.The argument is admittedly sketchy, but I don’t think it depends much on the implementation of “proves”. Of course I’d want the proof system to hold the same properties that are used in the proof of Löb’s Theorem, because my argument is almost an exact replica of that; but pretty much all proof systems actually used by people (apart from some extremely weak ones) satisfy these properties. And you’re guaranteed not to become worse off if you make your proof system stronger while your opponent stays the same. That’s a really big deal, if my argument is indeed correct.

About heat death: even though the two programs say they can iterate up to 3^^^3 or something, the actual time to cooperation will be much less, because the reasoning steps that I outlined are exactly the reasoning steps they’d need to stumble upon. Yes, that would make a proof of length 10000 or so (which would mean 2^10000 proofs tried), but a that’s long way from 3^^^3. If the programs use some kind of heuristics to search for working proofs (but still guarantee that they will do the brute force search if heuristics fail), the proofs can be found quicker still.

Re: “apart from some extremely weak ones”

It does depend on the implementation—because it is relatively easy to imagine poor provers that result in a failure to understand and prove anything much. So, you would need to specify some constraints on the prover before you can make statements about what it outputs.

timtyler: I think you’re confused about how proof verifiers work. A proof verifier takes a proof as input, and verifies that the given proof is correct. It doesn’t have to generate the proof itself, and it gets to assume that each step is some small, easily-checkable application from a finite set of axioms.

Put another way, if the proof you give a verifier makes a large leap in reasoning, then the verifier will just tell you that the proof is wrong. Moreover, by the formalization that we demand of those proofs, the proof

iswrong.So, the problem that proof verifiers solve is computable. So, we need only to assume that the proof verifier that these programs have is correctly implemented, we don’t really need to know their internal details.

(If you already understand this, though, then I’m confused about your confusion. Could happen.)

[edit:] Actually, I think I might see the point you’re making; we have to assume the theory the prover understands is rich enough to model stepwise program semantics. This probably should have been specified, but we can just assume it.

An automated proof-checker can identify

somecorrect proofs—but notallof them.That’s a consequence of “Tarski’s Undefinability of Truth Theorem”:

http://en.wikipedia.org/wiki/Tarski%27s_undefinability_theorem

You can make a Godel sentence for any supplied finite proof checker that essentially says: “This proof checker can never prove this statement to be true”. Sure enough, the proof checker won’t approve any proofs that the statement is correct—but the statement is true—and that can be proven.

Checking a proof in a recursively axiomatizable theory for correctness is decidable. The post topic programs only check proofs up to some length in, say, Peano arithmetic, which is recursively axiomatizable. Tarski’s undefinability theorem is about

completetheories of arithmetic.There is no such thing as a “complete theory of arithmetic”—see Godel’s incompleteness theorems.

The post says it is about “programs”—a fairly general class. Where is Peano arithmetic coming into the issue?

Please give up.

The above statement isn’t true. What Godel’s theorem says is that any effectively generatable theory which is strong enough to model PA is either incomplete or inconsistent. Effectively generatable for our purposes can be taken to mean that valid proofs in the system are enumerable and that there’s a primitive recursive procedure to determine whether a given proof is valid.

If one removes the effectively generatable condition then one can easily produce a complete, consistent system. Consider for example the following system: Consider some lexicographic ordering of well-formed statements in the language of PA, and let S(i) be the ith statement in this list. Let P(0) be PA, and generate P(n) by asking whether S(n) or ~S(n) is a theorem in P(n-1). If so, then set P(n) = P(n-1). If not, then let P(n) be P(n-1) with the additional axiom of S(n). Now, consider P(infinity) defined as the union of all P(n). This theory is complete and consistent but it is very much not effectively generatable since we can’t even recursively enumerate the system’s axioms.

It might help for you to take a model theory or logic course since there are a lot of subtle issues here that you seem to be missing.

I suppose there are complete

inconsistenttheories of arithmetic. However, here the discussion concerns whether a machine returns 1 - or not. Systems of math that are useless, or have infinitely many axioms and so can’t be implemented don’t really enter into the picture.Tim, you’re making false statements. You’ve now misapplied both Godel’s theorem and Tarski’s theorem.You also seem to be under the impression that proof checking is not generally doable in an axiomatic system, which is also wrong. Let me suggest that you simply don’t have the background expertise to discuss these issues. These are subtle issues. Indeed, even your above reply runs into new issues (For example having infinitely many axioms doesn’t prevent a system of axioms from being effectively generatable. Indeed many well behaved axiomatic systems have axiom schema which do just this.) You may be running into a Dunning-Kruger type problem. I’m a professional mathematician (ok, a grad student but close enough for this purpose), and I find these issues difficult. Vladimir, I and others have pointed out what is wrong with your reasoning but you don’t seem to be following the explanations. Please take a logic course or read a textbook.

Seconded.

Re: “You also seem to be under the impression that proof checking is not generally doable in an axiomatic system, which is also wrong.”

You are claiming that

every possible set of axiomsallows foreasyautomation of proof checking?!?If so, then we

dohave a genuine disagreement—and I think you are mistaken.Not every possible set. All you need is a primitive recursive procedure to decide whether a given statement is an axiom, and primitive recursive procedures to check whether each rule of inference is correctly followed in going from one step to another.

Right: automated proof checking is not really “generally doable” in an axiomatic system—how hard it is depends entirely on the axiomatic system under discussion.

It is possible that I wasn’t clear enough (my phrasing could certainly use work). I was thinking of statements such as your remark that “An automated proof-checker can identify some correct proofs—but not all of them” whereas we generally don’t call something an automated proof checker unless it terminates on any proof and returns correctly if that’s a valid proof or not in the axiomatic system. Sometimes we insist that this be a primitively recursive process. Sometimes this is also generalizable in a natural way such that the proofchecker can do this given some specific oracle (there are results about how these sorts of things can behave if they have access to a Halting oracle for example. I don’t fully understand these except in vague generality.)

FWIW, I was thinking of unprovable truths, (wrt some chosen axioms) that

areprovable with stronger axioms. There, a proof checker based on a finite axiom set would return “false”, for true and provable statements.Many statements aren’t simply “true”; if a statement isn’t logically valid, it’s only true in a specific theory, or about a specific structure. A given satisfiable non-valid statement is true about one structure, and false about another, true in one theory, false in another theory, independent in the third one.

Sure. In this problem, we at least have a specified problem domain—whether a machine will return “1” or “0″.

That’s irrelevant to a proof-checker. Proof-checkers reside in a specific axiomatic system. It doesn’t matter if there’s an axiomatic system that would give stronger results for the purpose that was under discussion, it just matters if there’s one of a given length in whatever formalism we want.

You keep talking with confidence about something you clearly don’t understand at all.

No. That’s not what Tarski’s theorem says. Tarski’s theorem is a statement about first-order arithmetic descriptions of truth (it can be generalized slightly beyond that but this is the easiest way of thinking about it). It asserts that there’s no first-order description of what statements are true in a system that lies in the system.

Proof checking is a very different issue. Proof checking is straightforward. Indeed, most of the time we insist that axiomatic systems have proofs checkable as a primitive recursive function. We can do this in Peano Arithmetic for example, and in PA every proof can be verified by a straightforward algorithm.

You’d need be careful about what “system” or “description” are you talking about, since you can weakly represent, say, the set of statements true in Robinson arithmetic (which is recursively enumerable), in Robinson arithmetic. There is a formula s(-) with one free variable such that

Q |- s(gn(f)) iff Q |- f

where gn(f) is the Gödel number of (arbitrary) statement f and Q is Robinson arithmetic.

Yes, I know that. I was being deliberately imprecise because it is very easy in this case to get lost in the technical details. My intention was not to convey the theorem’s precise statement but rather convey to Tim why it doesn’t apply in the context he’s trying to use it.

I know you know that, but I couldn’t quite naturally interpret your informal statement as saying the formally correct thing, so had to make the clarification.

Here we are not talking about Peano Arithmetic. We are talking about whether a computer program returns 1 - or not. If you want to talk about how easy it is to verify proofs based on the axioms of Peano Arithmetic, I think you first need to explain why you think that is relevant to the problem—which

saysit is talking about “all possible proofs” anddoesn’trefer to a specified axiom set.Tim, please reread what I wrote. PA is an example. The relevant point is that Tarski’s theorem doesn’t say what you seem to think it says.

It’s an example—but one that doesn’t relate to the problem at hand, which concerns whether a machine returns 1 - or not. If you

ASSUMEsome particular axiom and proof system, the problem becomes easier. However, withnospecified axiom system for generating proofs—and no specified proof checker—then the problem becomes poorly-specified.Regarding Tarski’s theorem—you seem to be misinterpreting me. I am not saying that checking a proof with respect to an axiom system is necessarily difficult. I am saying that truth is not contained by any finite axiom system. So, there will be true statements that are unprovable

withinthe system—but canstillbe proven in alargersystem. That is also what Tarski’s theorem happens to say. Check it out:“The undefinability theorem does not prevent truth in one theory from being defined in a stronger theory. For example, the set of (codes for) formulas of first-order Peano arithmetic that are true in N is definable by a formula in second order arithmetic. Similarly, the set of true formulas of the standard model of second order arithmetic (or n-th order arithmetic for any n) can be defined by a formula in first-order ZFC.”

http://en.wikipedia.org/wiki/Tarski%27s_undefinability_theorem

If that’s what you mean, then you seem to be even more confused than I realized. If you are just talking about the original context then the set of proofs that are below some fixed length cannot be a statement that is true and unproveable. The proof itself self-verifies. If the statement in question is true, there must exist a proof of the statement that is of the desired bounded size. This follows from consistency and the fact that there are only a finite collection of such proofs.

With a finite limit on proof length, the Godel sentence I mentioned might be longer than the limit—and so not fall within the set of proofs under consideration.

However, the point of mentioning the “Undefinability of Truth Theorem” was to illustrate how there might be truths that would be provably correct—under a more sophisticated set of axioms—but which are

notcaptured by the axiom system being used. That can still be true if the limit on proof length is finite.Incidentally, your rhetoric about me is getting out of control. Please calm down!

The proof checker checks proofs within some formal theory. The Godel sentence for the checker is certainly true and provable by us, given the consistency of that formal theory. (If the theory were inconsistent, the checker would be able to prove any sentence.) But this doesn’t work as a proof within the theory! The theory cannot believe its own consistency (Godel’s second incompleteness theorem), so the checker cannot assume it when checking proofs. So your argument doesn’t actually give an example of a valid proof rejected by the checker.

I think you mean that there would be some proof (that checks) for any sentence.

Yes. Thanks.

Worse, it’s only “true” that a consistent theory is consistent in some unclear sense, because you can extend the theory with a statement that asserts inconsistency of the original theory, and the resulting theory will remain consistent.

What you’re saying is certainly true (onlookers, see pages 5-6 of this pdf for as especially clear explanation), but I like to think that you can’t actually exhibit a proof string that shows the inconsistency of PA. If you could, we’d all be screwed!

Proof in what theory, “can’t” by what definition of truth? In the extension of PA with inconsistency-of-PA axiom, it’s both provable and true that PA is inconsistent.

A proof in PA that 1+1=3 would suffice. Or, if you will, the Goedel number of this proof: an integer that satisfies some equations expressible in ordinary arithmetic. I agree that there’s something Platonic about the belief that a system of equations either has or doesn’t have an integer solution, but I’m not willing to give up that small degree of Platonism, I guess.

Youwould demand that particular proof, but why? PA+~Con(PA) doesn’t need such eccentricities. You already believe Con(PA), so you can’t start from ~Con(PA) as an axiom. Something in your mind makes that choice.For those curious what Nesov is talking about:

-- Scott Aaronson

Let’s say we are trying to prove statements within ZFC.

“ZFC can never prove this statement to be true”

...is one thing and...

“This proof checker can never prove this statement to be true”

...is another.

Neither can be proved by the specified proof checker—but the second statement can be proved by another, better proof checker—still working within ZFC—so it can be seen that it is true.

“proved by a proof checker”—huh?

“Asserted”, “approved”—whatever.

It’s a nice result, but I doubt it’ll ever be practically relevant.

You could implement the algorithm today, although with a limit of less than 3^^^3. Rough guesstimates suggest that about a limit of 50 would be workable at home, 100 for a worldwide distributed computing effort and 500 for a universe-sized computer that does one operation per bit per plank time. In all cases, nowhere near 10000 (let alone 3^^^3).

Heuristics could help a bit, but making the algorithm more complex is likely to also make the proof that it works longer and so harder to find. The fallback to brute-force search is unhelpful because no computer that could actually exist would be able to find the proof that way.

If program A uses heuristics, program B doesn’t need to prove that these heuristics are correct. It only needs to see that A’s use of heuristics makes A either output 1 or move on to the brute-force search.

Agreed with the rest of your comment. An easier way to reach the same conclusion: if program B is implemented as “always defect”, program A has to run the brute-force search to the end, there’s no shortcut (because having any shortcut in the code would make the proofs invalid if B were a copy of A).

Edit: The following is not obviously possible, see this comment.This can be generalized to provide even better algorithms for more general games. Let the following be a simplified outline of the algorithm for cooperation problem, with bounds omitted (players are A and B, our player is A):

The proof that the outcome is “Cooperate” basically hinges on the provability of Löb statement

from which we conclude that A()==B() and so A()==”Cooperate”. The statement can be constructed from

and so

Now, we can get basically the same proof if we use a simpler algorithm for the players:

This gets rid of the more general relation, but achieves basically the same result. We can improve on this by defecting against cooperating rocks:

This suggests a general scheme for construction of decision-making algorithms: for all possible actions a and b of players A and B, sort the pairs by utility for A, starting from highest, and implement the Löb algorithm for each possibility in turn:

This plays against arbitrary opponents quite well.

Sorry if I’m being naive, but if we amend the program that way then doesn’t it lose the ability to prove the Löb statement implies(proves(P), P) where P = “A()==‘Cooperate’ && B()==‘Cooperate’”? (Due to the unprovability of unprovability.)

Can this problem be solved by adjusting maximal length of checked proofs such that unprovability of proves(“A()==”Defect” && B()=”Cooperate”″, maxDC) is provable by proves(..., maxsteps)?

Yes, that occurred to me too, and I

thinkit does the job.I wonder if there are theorems of the form “There’s no significantly better way to show that P isn’t provable in N steps than by listing all proofs of length N and observing that P isn’t proved”. Then of course maxsteps would need to be exponential in maxDC.

On the other hand the fact that proves(“A()==”Cooperate” && B()==”Cooperate”″) was called means that A()==”Defect” && B()==”Cooperate” is unprovable under maxDC proof length. The question is can this fact be used in proof. The proof checker can be programmed to add such self-evident propositions into set of axioms, but I can’t clearly see consequences of this. And I have a bad feeling of mixing meta- and object levels.

I’m not sure if your idea will work, but have some notation to help you avoid mixing the levels. Here’s how you say “try to prove this fact conditional on some other fact”:

OK, so first we try to prove P1 = “A()==”Defect” && B()==”Cooperate”“ using some theory T, then on failing that, we try to prove P2 = “A()==”Cooperate” && B()==”Cooperate”” using the augmented theory T’ = T + “T cannot prove P1 in maxDC steps”.

And then T’ can prove that “If T’ proves P2 then P2 is true”.

And note that the number of steps doesn’t matter any more.

So perhaps Nesov’s idea was OK all along?

“P isn’t provable in N steps” is in Co-NP. Since in general it’s an open question whether P=Co-NP, there might be a polynomial time algorithm to solve it, or their might not, respectively.

On the other hand, if you 1) Showed some large class of particular problems (ones that can reduced to from [no typo there] nonequivalent SAT instances, which I suspect is most of them) weren’t provable in N=cP steps IF it’s unprovable, then 2) You would have shown a sufficiently large class of SAT problems weren’t easily solvable in N=kP steps, 3) which would imply P!=NP (and also P!=Co-NP).

If on the other hand your whole class of problems is unprovable, then you can prove P isn’t provable by your theorem, which is presumably smaller than N in this context.

So, in summary my guess interesting classes of P, your question is equivalent to “Does P=NP?”, which we know is a tough nut to crack.

This is true, and it’s a bit disturbing that it took so long for someone to point that out. I feel that there should be a way out, but don’t see what it is (and finding it might suggest interesting limitations).

According to my experience, the study of game-playing programs has two independent components: fairness/bargaining and enforcement. If you have a good method of enforcement (quining, löbbing, Wei Dai’s joint construction), you can implement almost any concept of fairness. Even though Freaky Fairness relied on quining, I feel you can port it to löbbing easily enough. But fairness in games with non-transferable utility is so difficult that it should really be studied separately, without thinking of a specific enforcement mechanism.

Bargaining is about logical uncertainty, something we avoid using sufficiently high upper bounds in this post.

Suppose we have two possible actions X,Y and A prefers to to to and B prefers to to to . What will be the outcome if both of them use main4() with their respective orderings and the same proves() function?

Probably , which is not quite optimal. This doesn’t solve the bargaining problem, which is exactly what your example is.

Most likely , but possibly or if one of the agents manages to “trick” the other.

Your idea is similar to preferential voting.

The generalized action idea, applied to this setting, works as follows. Agents A and B can surrender control over their output to procedures they don’t fully understand, but expect to produce good results. That they expect good result is reflected, as in previous construction, in the order in which they check for the possibility:

Here, the programs might be unable to prove anything specific about the output of Foo() and Bar(), but can still use the trick to coordinate their simultaneous invocation. This is also one way to look at how to solve the coordination problem.

Note that Foo() and Bar() might as well implement the algorithms similar to what we have in main(), so choosing them is a

high-leveldecision, while the low-level decisions still remain to be madeby them. Deciding hierarchically makes individual proofs, and individual decisions, easier to perform, so that you don’t need a ridiculously big upper bound on which proofs to consider. That we can forget how the hierarchical code is written (say, by inlining most functions) also shows the potential for finding similarly simplifying patterns in the more monolithic code.This is awesome—great post!

I’ve scanned briefly through the comments, and didn’t see anyone ask this, but apologies if I missed it:

You use the diagonal lemma to establish that there is a piece of code called “box” such that the following is provable:

eval(box) == implies( proves(box, n1), eval(outputs(1)))

But couldn’t there likewise be a “box2” such that:

eval(box2) == implies( proves(box2, n1), eval(outputs(2)))

And then couldn’t you patch up the proof somehow to show that program has to output 2? Specifically, to get the analogue of step 2, can’t you prove that proves(outputs(2), n) implies the negation of proves(outputs(1), n), and hence prove that proves(outputs(2), n) implies outputs(2)?

Even if this is correct, it only applies to the program you use in your proof, where if proves(outputs(2), n) then 2 is returned; whereas the puzzle above doesn’t allow for 2 to be returned at all. However, if I’m right then presumably something must be going wrong in the proof?

To prove that, the proof system would need to know its own consistency, because an inconsistent proof system could prove

bothof these statements and indeed any statement at all. But a proof system cannot know its own consistency without actually being inconsistent (Goedel’s second theorem).For a finite set of proofs, that a given statement is not provable by a proof from that set, is provable, possibly by a proof from that set.

Technically true. You could always just evaluate the two statements and there you have your proof, with length exponential in n. I very much doubt the proof could be made shorter than n though, because the statement is equivalent to “you can’t find a contradiction in these axioms using less than n steps”, the difficulty of which should grow way faster than linearly in n.

Good catch! Yes, my argument only shows that there’s no short easy proof. You can always just evaluate the two statements and there you have your proof. But that has length exponential in nmax. I’d be amazed if “1=2 is unprovable with length < nmax” were provable with length < nmax, which is pretty much the same statement. This is a very very strong statement. It can also be reformulated as “you cannot find a contradiction in the theory using less than nmax steps”.

Aha—very cunning.

The most obvious implementation for the easy version will return 0 (or, if the length limit is lifted, fail to return at all).

It’s important, however, to be clear on just what constitutes a valid proof. Consider the following implementation:

Not valid because it didn’t find a proof, you say? On the contrary, assuming we are dealing with idealized mathematical computers that are perfectly deterministic and have no outside inputs, the fact that a program returns a given value on this run does indeed constitute a proof that it will do so on every run.

The point being that, while you

canuse concepts like proof to draw rigorous conclusions, in the face of counterintuitive things like self reference, you need some experience actually nailing things down before you can be sure you’ve closed all the loopholes.I don’t understand this. Do you mean the proof checker fails to check anything? Wouldn’t that be a

wrongimplementation, rather than an obvious one?No, I mean the obvious implementation would have the program basically asking “what does this program return? Well, it tries to find out what this program returns. What does this program return? Well, it tries to find out...” until it hits the length limit and comes back with no solution found (or, if the length limit is lifted, keeps going forever). This is true even if the proof

checkeralways returns quickly.This is absolutely not the obvious implementation of the algorithm in the post. My program does not “try to find out” something. It mechanically checks all proofs up to some maximum length. You can’t just replace a piece of code with another piece that has a vaguely similar intent, but fails to halt!

The ‘fails to halt’ part in either case would be only if the length limit were removed. But you are right that mechanically checking all proofs up to a maximum length is a mathematically valid implementation and can only be replaced with another piece of code that is guaranteed to give an identical result.

However, that implementation can still be reasonably described as trying to find out something. The intent can be seen not in the simple loop that iterates through all proofs up to a certain length, but in the choice of logic and axioms to encode the problem. The proofs generated from suitably chosen axioms are isomorphic to execution paths of an appropriately written program (as whpearson observed, a way to see this is to consider the Curry-Howard correspondence).

This isn’t the same as the calculator that asks itself what value it will return, because there’s a subtle separation of levels here.

The Curry-Howard correspondence is what you are referring to right?

That program does not prove its return value under the Curry-Howard Correspondence without a lot of footwork (see first caveat). Under the CH interpretation, programs are proofs that their types are inhabited, not that they can produce their return values. Thus, the canonical thing one could say the above program proves is “int”.

Some caveats. First, one could imagine a sufficiently complex type system where there is a type of all integers which are 1. Then, one could say that the above program is a proof of the type “type-of-ints-that-are-1”. However, when adding this much specificity to one’s type system, one has to be careful that the resulting system can decidably type terms; adding richness often leads to undecidability.

Second, there is simple constructive proof that the the above program produces 1. Just run it and see; if it returns 1 in finite time, that’s a proof.

That is certainly one way to look at it.

Another way to look at it is to consider purely computational proofs, such as the four color theorem, or the solutions to checkers and 5 x 5 Go.

I’m curious, can the downvoters give any reasons for disagreeing with me?

I did not downvote (you were at −3 and −2 when I wrote this), but I think the there are many possible reasons to downvote you depending on the voter’s implicit or explicit theory of voting.

This might be seen as a pure “puzzle” post, and you aren’t offering a puzzle solution that takes the question at face value. Your program doesn’t do anything quine-like (like put its own source code into a input variable), nor does it have any logic engine (even by implication).

This is the kind of solution evolution might work out in very certain specific environments where it happened to “just work”, but if you put this code into a place where it actually had to “work for a living” by analyzing other programs using roughly the same routines it uses to analyze itself and it gained or lost by modulation of its behavior with them (and some programs were suckers, some were cheaters, and some could be implicitly bargained with, so there were real opportunities to gain or lose) this code would not give you “optimal play ignoring computation costs” (it is either a sucker or a cheater, depending on what “1″ means and in some contexts either of these strategies will lose).

Maybe another way of putting the criticism is that your “clever trick” is in noticing that the problem can sort of be solved

in practiceby collapsing down to havenoextra levels of self awareness. But for theoretical robustness inothercircumstances it would be preferable to have a system that scales to infinity, so that agents capable of modeling other agent’s models of their own attempts to model other agents can still be “trusted by bad modelers”. (Remember that part of the unspoken motivation here is to build skills towards building a verifiably friendly AGI.)Also, the people here are all super smart, so telling them “just be dumber and people will trust you” isn’t an answer they want to hear, and saying this (even indirectly) could

maybebe seen as an attempt to troll, or to drag people into a topic area that is too difficult to discus, given LW’s current sanity waterline (which is the only reason Ipersonallyvote things below zero).Another possibility is that people don’t see the reversed Tortoise and Achilles joke and just think (1) you said something dumb and (2) they should “punish dumbness” in order to maintain LW’s purity (IE the signal to noise ratio).

The more reasons to downvote you expose, the more chance you have to accumulate a really low score, and that comment exposed several.

Thanks for the analysis. I’ll try to clarify those points, then.

Of course, I’m not suggesting my answer as a solution to the prisoner’s dilemma. I actually don’t think this kind of mathematical analysis is even relevant to the prisoner’s dilemma in practice—I think the latter is best dealt with by pre-commitments, reputation tracking etc. to change the payoff matrix and in particular change the assumption that the interaction is one-off—which it typically isn’t anyway. (Note that in cases involving actual criminals, we need witness protection programs to make it even approximately one-off.)

As for the logic engine, hey, I just unrolled the search loop then deleted 3^^^^3 − 1 lines of unreachable code. No different than optimizing compilers do every day :-)

If we look at the logic puzzle aspect though, this is in similar vein to

this statement is true(the reverse of the classicthis statement is falseliar paradox—it has two consistent solutions instead of none—note thatreturn 0is just as valid a solution asreturn 1).And if you ask what sort of proof system permits that, you start investigating higher-order logic, and self-reference, and the classic paradoxes thereof, and type theory and other possible solutions thereto, which are topics that are of very real significance to software verification and other branches of AI.

In summary, I think this puzzle goes places interesting enough that it’s actually worth taking apart and analyzing.

Actually,

this statement is provable. This is the “Henkin sentence” and it is indeed provable in Peano arithmetic, so no need for stronger things like type theory etc. My proof in 1 is basically a translation of the proofs of the Henkin sentence and Löb’s theorem (which is used to prove the former).Yes, in that sense you’re right, this is isomorphic to the Henkin sentence, which is provable in first-order Peano arithmetic,

ifyou set everything up just right in terms of precisely chosen axioms and their correspondence to the original code. (Note that there is no guarantee of such correspondence—return 0is, after all, also a correct solution to the problem.)But the metaproof in which you showed this, and the sketch you gave of how you would go about implementing it, are

notwritten in first-order PA. And by the time you have actually written out all the axioms, implemented the proof checker etc., it will be apparent that almost all the real work is not being done on the computer, but in your brain, using methods far more powerful than first-order PA.If we don’t regard this as a problem, then we should be willing to accept

return 1as a solution. But if we do regard this as a problem (and I think we should; apart from anything else, it means nearly all problems that would benefit from formal solution, don’t actually receive such, because there just isn’t that much skilled human labor to go around—this problem is itself an example of that; you didn’t have time to more than sketch the implementation) then we have to climb the ladder of more powerful formal techniques.And once again I will ask the downvoters, which part of my comment do you disagree with?

I upvoted the grandparent and parent, because what you said seems right to me. I wish people wouldn’t downvote someone asking why e was downvoted.

Upvoted entire ancestor tree, for similar reasons.

Does the obvious Lobian proof for 1 still go through if there’s a bound on the proof length built into all the sentences?

In the linked text file I tried to do just that, carefully tracking proof lengths and making sure the bounds stay bounded. But this could really use some checking.

The second problem uses the same trick, of course: “if we both prove that we output the same thing, then we output the same thing”, then move ahead using “we both prove” instead of “I prove” everywhere.

Edit: Wow, I really am an idiot. I assumed the second statement was true about every statement, but I just realized (by reading

oneextra comment after posting) that by Lob’s Theorem that’s not true. But my original idea, that the first statement is all that’s required to prove anything, still seems to hold.Okay, I can follow the first proof when I assume statement 1, but I don’t quite understand how cousin_it got to 1. The Diagonal Lemma requires a free variable inside the formula, but I can’t seem to find it.

In fact, I think I totally misunderstand the Diagonal Lemma, because it seems to me that you could use it to prove anything. If you replace “outputs(1)” by “2==3″, the proof still seems to hold. Statement 2 would still be true with “2==3” (it is true about any statement, after all), and all the logic that follows from those two statements would be true. In fact, by an unclearly written chain of reasoning which I originally intended to post before realizing that it would be much simpler to just say this, all you seem to require is the first statement to be able to prove anything. If I am mistaken, which is probable, then I expect my error lies in the assumption that “outputs(1)” could be replaced by any string of code.

For my original unclear explanation of why the first statement in the proof seems to allow anything to be proven, in case anyone cares for it:

Anyone care to point me to my mistake (or, to satisfy my wildest of dreams, say that I appear to be right ;) )?

I don’t understand how the first statement can be used to prove anything...

The second statement might be true for every statement, but it’s not necessarily provable for every statement, which is required in the proof. In fact, the second statement is provable for “outputs(1)” by inspection of the program (because the program searches for proofs of “outputs(1)”), but not provable for “2==3” (because then “2==3″ would be true, by Lob’s theorem).

I’m sorry, my comment grew into a mess, I should have cleaned it up a bit before posting. Anyway, I agree fully about the second statement only applying to this program, that’s what I realized in the edit.

But for the first statement, I’ll try to be a bit more clear.

My first claim is that “eval(box) == implies(proves(box, n1), eval(‘2==3’))” is a true statement, proven by the Diagonal Lemma. I’ll refer to it as “statement 1″, or “the first statement”.

If “eval(box)” returns false, then for the first statement to be true, “implies(proves(box, n1), eval(‘2==3’))” must return false. “implies” only returns false if “proves(box, n1)” is true and “eval(‘2==3’)” is false. Therefore for statement 1 to be true when “eval(box)” is false, then “proves(box, n1)” has to be true, which is a contradiction: “eval(box)” can’t be false and also provably true. Therefore, “eval(box)” must be true.

So let’s say “eval(box)” is true, that means that “implies(proves(box, n1), eval(‘2==3’))” must also return true for statement 1 to be true. One way for the “implies” statement to return true is for “proves(box, n1)” to return false. Since I have proven above that “eval(box)” is true, any good definition of the “proves” function will also return true, because if it finds no other, it will at least find my proof. Therefore, “proves(box, n1)” will return true.

So there is only one other way for the “implies” statement to return true: “eval(‘2==3’)” must return true. Therefore, “eval(‘2==3’)” returns true, and it follows from this that 2=3.

So, where did I go wrong?

Your proof of eval(box) relied on the fact that eval(box) can’t be false and also provably true. But that fact is equivalent to consistency of the formal theory, so it can’t be proven within the formal theory itself, by Godel’s theorem. Of course, since eval(box) is a terminating computation that returns true (your proof of that is correct), the formal theory can eventually prove that by step-by-step simulation. But that proof will be much longer than your proof above, and much longer than n1. In fact, proves(box,n1) will return false, and your comment serves as a nice proof of that by reductio.

Don’t get discouraged, it’s a very common mistake =)

I think it’s possible to coordinate without the huge computational expense, if the programs would directly provide their proofs to each other. Then each of them would only need to check a proof, not find it.

The input to the programs would be a pair (opponent’s source, opponent’s proof), and the output would be the decision: cooperate or defect.

The algorithm for A would be: Check the B’s proof, which must prove that B would cooperate if the proof it gets in its input checks out. If the proof checks out—cooperate, otherwise—defect.

If A needs to give B the proof of some theorem that involves both A and B (as in the post), how does it obtain such a proof quickly without knowing the source code of B in advance? And if it’s a proof of some theorem involving only A, then what is that theorem and how does the whole setup work?

Let ChecksOut(

code,proof, X) = true iffproofis a proof thatcodeis a function of two parametersPCodeandPProof, which returns “C” if ChecksOut(PCode,PProof).Def A(

code,proof): if(ChecksOut(code,proof)) return “C”, otherwise return “D”.The definition of ChecksOut is kinda circular, but it should be fixable with some kind of diagonalization. Like:

SeedChecksOut(

code,proof, X) = true iffproofis a proof thatcodeis a function of two parametersPCodeandPProof, which returns “C” if eval(X(PCode,PProof)),ChecksOut(

code,proof) = SeedChecksOut(code,proof, #SeedChecksOut)Hmm, that’s pretty cool, thanks! But it’s still not obvious to me that it solves the same problem as the algorithm in the post. There are probably many mutually incompatible implementations of ChecksOut, because there are many ways to diagonalize that differ only syntactically. Do you have an argument why they will end up cooperating with each other? It seems to me that they won’t, just like different implementations of quining cooperation.

Hmm, yes, you’re right, it’s quining cooperation all over again—as defined, the programs would only cooperate with opponents that are sufficiently similar syntactically. And they need compatible proof systems in any case. So it’s not a solution to the same problem, only a possible optimization.

I believe Program A in “the easy version” would return 0. Assuming zero run-time errors, its structure implements the logical structure:

If A can be proven to return 1 in under

nsteps, then A returns 1.If A cannot be proven to return 1 in under

nsteps, then A returns 0.However

nis defined (the post proposesn= 3^^^^3), it can be shown by the definition of the word “proof” that:If A can be proven to return 1 in under

nsteps, then A returns 1.and therefore the first proposition holds for every program, and cannot be used to show that A returns 1.

However, the second proposition

alsocannot be used to show that A returns 1. If the given condition holds, A does not return 1; if the given condition does not hold, the second proposition demonstrates nothing.Therefore no property of the program can be used to demonstrate that the program must return 1. Therefore no proof can demonstrate that the program must return 1. Therefore the program will find no proof that the program returns 1. Therefore the program will return 0.

Q.E.D.

IIRC, the modification of Gödel’s statement which instead has the interpretation “I

canbe proved in this formal system” is called a Henkin sentence, and does in fact have a finite proof in that system. This seems weird in the intuitive sense you’re talking about, but it’s actually the case.Yep. The Henkin sentence has already come up multiple times in the comments here. Understanding the proof of the Henkin sentence takes you about 95% of the way to understanding my original argument, I think.

...Nope.

This logic stuff is pretty tricky, but I’ll try to explain how

exactlythe distinction between truth and provability matters in this problem. It was quite an epiphany when it hit me.The first proposition indeed holds for every program. But it is

notwhat is used to show that A returns 1. What is really used is this: A can prove that the first proposition holds for A by looking at the source code of A. This is definitely not the case for every program! (Provability-of-provability does not imply provability-of-truth. This is the whole point of Löb’s Theorem.) A’s proof of the first proposition relies on how the code of A looks—specifically, that it says explicitly “if provable then return 1”. This proof would break if the structure of A were changed. For example, if A were searching forboth proofs and disproofsthat it would output 1, it wouldn’t be able to prove the first proposition quite so easily, and maybe not at all.I’m still confused, sadly.

Can you follow Eliezer’s cartoon proof of Löb’s Theorem, linked from the post? It’s kind of a prerequisite to understand what’s happening here.

I believe follow Eliezer’s proof—should I look over your proof again?

Well, my proof is basically a rewrite of Eliezer’s proof in another notation and with tracking of proof lengths. The exact same steps in the exact same sequence. Even the name of the “box” statement is taken from there. So you can try, yeah.

Yeah—I suspect the answer is “not at all” because this ‘searching for proofs or disproofs’ seems to be exactly what’s needed to unbreak the reasoning in my comment above.

And interestingly, posters here only started criticizing this game-theoretic reasoning when I used it. c=/

Why?

Your original formulation did not include the “I expect you to” part.

Without the very long limit you’ve added, the easy problem becomes formally undecidable as it leads to an infinite regress. In order to return 1, the program has to determine whether it will return 1, and in order to do that it has to determine whether it will return 1, and in order to do that it has to determine whether it will return 1, and..… You’ll use up your 3^^...^^3 iterations and return 0.

It’s a bit like this problem. http://en.wikipedia.org/wiki/Halting_problem

Without the limit, the first program would return 1. My proof keeps working almost without modifications. The only change will be that main() will use a while loop instead of an if, which is only used in deriving (2) and doesn’t change its validity. Your argument doesn’t show that the program will never find a proof—it only shows that it can never find the

specificproof that you outlined, because it leads to infinite regress or whatever. It will just find another proof instead.OK. Let’s suppose you successfully write an algorithm a() that can do everything that you said.

Now let’s make some changes. Suppose you were, in the course of your search of proof space, to come across a proof that a() returned 0? You could simply return 0 at that point rather than continuing to search for proofs that a() returned 1. So let’s make that change.

Now comes the fun bit. Change your algorithm a() such that it returns 1 when it hits the iteration limit instead of 0. Not that hard.

Now comes the really fun bit. Invert the output of this algorithm. Let’s call it b(). This function now has the interesting property that it returns 1 when a() returns 0, and vice versa—except in the case where no proofs can be found of a value for a() short of the termination condition, where both now return 0, It’s also interesting that it has exactly the same definition as a() does....

What Wei Dai said. The first change you propose breaks A’s ability to obtain step 2 by simple reasoning about itself. Originally I came up with a different problem: systematically search all proofs up to a large length, and if we find a proof that we output some number, then output that number. That problem is way more crazy than mine, I don’t know what the answer is, it might actually depend on the proof checker’s implementation.

If you make this change, then step 2 of cousin_it’s proof won’t go through anymore, I think.

So do you also think that this statement is true: “This statement is true if and only if it can be proven to be true”?

See this PDF, paragraph 44.3. It constructs exactly the statement you gave (Henkin’s statement) and proves it’s provable.

Ok, you’ve convinced me. Saying that this statement is true is basically the same as saying program A outputs 1.

Yes, exactly. The proof is a little fussy because it has to keep track of bounded proof lengths everywhere, but it works basically the same.

Does it help you understand the second problem better? It uses the same trick, but with a twist. (You need to get creative when you talk about two proof systems instead of one.)

I still agree with what I said originally about the second problem. You could compare it to these two statements:

A) A is true if and only if A and B are either both provable, or both non-provable. B) B is true if and only if B and A are either both provable, or both non-provable.

It is much more obvious that both of these statements are necessarily true and provable (by symmetry), than it is that “this is true if and only if it is provable” is true or provable. This is why at first I only accepted the second case.

Actually, after thinking about it more, the statement is obviously provable, without the long derivation.

If “this statement is true iff it is provable” is false, then it naturally cannot be provable. But then it is both false and not provable, which logically implies that it is true iff provable. Therefore it is both true and provable.

So the first program has to output 1 for the same reason.

When you have a self-referential sentence, it does no good to prove it in the “real world”, because the commonsense logic we use in the “real world” becomes inconsistent when applied to self-referential statements. (“This statement is false.”) You really need a proof of your statement within the formal system. And within the system, the proof you gave doesn’t work: the system cannot prove that falsity improves non-provability, because that would be the system assuming its own consistency all over again. So the long derivation is needed after all.

I agree that the system can’t use this argument, but that doesn’t mean the argument doesn’t work, just like your argument for why your algorithm must defect against a program that always defects.

The difference between my argument and “this statement is false” is that by directly depending on its own truth, “this statement is false” leaves its truth or falsity insufficiently defined. But “this statement is true iff it is provable” depends not directly on its truth but on provability, which is already defined. So I think that my “real world” argument does establish that the statement will be true also within a formal system, even though a formal system cannot establish it with this argument.

Um, “this statement is true iff it is provable” directly depends on

bothtruth and provability. It does refer to its own “platonic” truth value in the “real world”.It refers to its truth value but not in a paradox generating way.