AI cooperation in practice

You know that au­to­mated proof ver­ifiers ex­ist, right? And also that pro­grams can know their own source code? Well, here’s a puz­zle for you:

Con­sider a pro­gram A that knows its own source code. The al­gorithm of A is as fol­lows: gen­er­ate and check all pos­si­ble proofs up to some huge size (3^^^^3). If A finds a proof that A re­turns 1, it re­turns 1. If the search runs to the end and fails, it re­turns 0. What will this pro­gram ac­tu­ally re­turn?

Wait, that was the easy ver­sion. Here’s a harder puz­zle:

Con­sider pro­grams A and B that both know their own, and each other’s, source code. The al­gorithm of A is as fol­lows: gen­er­ate and check all proofs up to size 3^^^^3. If A finds a proof that A re­turns the same value as B, it re­turns 1. If the search fails, it re­turns 0. The al­gorithm of B is similar, but pos­si­bly with a differ­ent proof sys­tem and differ­ent length limit. What will A and B re­turn?

This sec­ond puz­zle is a for­mal­iza­tion of a Pri­soner’s Dilemma strat­egy pro­posed by Eliezer: “I co­op­er­ate if and only I ex­pect you to co­op­er­ate if and only if I co­op­er­ate”. So far we only knew how to make this strat­egy work by “rea­son­ing from sym­me­try”, also known as quin­ing. But pro­grams A and B can be very differ­ent—a hu­man-cre­ated AI ver­sus an ex­trater­res­trial crys­tal­loid AI. Will they co­op­er­ate?

I may have a ten­ta­tive proof that the an­swer to the first prob­lem is 1, and that in the sec­ond prob­lem they will co­op­er­ate. But: a) it re­quires you to un­der­stand some logic (the di­ag­o­nal lemma and Löb’s The­o­rem), b) I’m not sure it’s cor­rect be­cause I’ve only stud­ied the sub­ject for the last four days, c) this mar­gin is too small to con­tain it. So I put it up here. I sub­mit this post with the hope that, even though the proof is prob­a­bly wrong or in­com­plete, the ideas may still be use­ful to the com­mu­nity, and some­one else will dis­cover the cor­rect an­swers.

Edit: by re­quest from Vladimir Nesov, I re­posted the proofs to our wiki un­der my user page. Many thanks to all those who took the time to parse and check them.