programjames comments on Payorian cooperation is easy with Kripke frames

programjames 11 Mar 2026 18:11 UTC
3 points
0
I was talking with Joseph, and I think I like his SharkBot more because it fails more gracefully. Suppose (1) “proof” has an upper bound in computation cycles, and (2) people occasionally make mistakes in their logic. A good prover might spend more computation cycles in error correction. What happens if they do not have enough time to prove cooperation leads to cooperation, or defection leads to defection?

If Joseph’s bot stalls out on the second half of its computation, it concludes, “I couldn’t prove they would cooperate if I’m caught (provably) defecting,” and cooperates. If your bot stalls out on the second half of its computation, it concludes, “I couldn’t prove being caught (provably) defecting would lead to them also defecting,” and thinks it can get away with defection.

Another way of putting it: dumber bots are more likely to think they can get away with defection when they really can’t, and defect against bots smarter than them. If a bot is going to try to take advantage of rocks ^[1] , it better well make sure it is actually playing against a rock, and not just making a stupid mistake that hurts everyone.

Also, as an aside, I think making mistakes (logical bit flips some percent of the time) naturally penalizes high-complexity policies. This is why you might expect societies to begin with mostly cooperate/defect bots, then transition to citizen/police bots, and slowly build complexity where each individual’s policy is relatively simple, but society as a whole gets more complex interactions. I think this would be an interesting area of research.
1. ↩︎
  From the phrase, “how do you play a Prisoner’s Dilemma against a rock?” Rocks are bots that cooperate even if it is proven you are going to defect.
- James Payor 12 Mar 2026 9:25 UTC
  2 points
  0
  Parent
  Thanks, this is a solid point that choosing the defect-cooperate outcome should really be based on constructive knowledge, and the fallback shouldn’t be “definitely defect!” if you can’t obtain that.
  So that makes me more think that is the sort of term that is legit in a PrudentBot, since “we can’t find a proof that if we provably defect then they cooperate” is what we actually wanted. And sure the naive translation into provability logic won’t like this, and I think this means we should look for a nicer translation, probably starting with getting a bounded proof-search bot correct.
  I’m wondering about what the SharkBot term is doing, compared to my (probably broken) idea of using . If I converting this one to diamonds I get:
  - (“it’s possible that I necessarily defect while my opponent defects”)
  - (“it’s possible that my opponent defects while it is not possible that I cooperate”)
  This seems potentially more restrictive than it needs to be? Well, restrictive can be good here, since what we are targeting is defecting as often as we can while guaranteeing a defect-cooperate outcome. But also as pointed out elsethread this PrudentBot approach may have trouble cooperating with itself.
  Anyway this leaves me more ready to unpack the SharkBot condition, which is . This is a bit different than what I landed at, reading “it’s possible that my opponent defects or it is possible that I cooperate”.
  Converting the SharkBot condition back to squares I get:
  Ah okay, so I think this is caching out the version that’s like “I defect if it is stable that my opponent cooperates while I provably defect”. (I think this may be equivalent in strength to my condition under Lob’s theorem?)
  Btw on the theory that “” was a bit of a mistake relative to “” as a basic approach, we could simplify a bunch of the language above to “it’s possible that my opponent defects while I defect” (my one) or “it’s possible that my opponent defects or I defect” (SharkBot). I’m taking this as some evidence that it would be better to go with the forms.
  - transhumanist_atom_understander 12 Mar 2026 14:33 UTC
    1 point
    1
    Parent
    I don’t think was a mistake.
    
    This gets at the deeper issue: how do we represent an action having a consequence? I don’t think it’s .
    
    In the Prisoner’s Dilemma tournament, you don’t respond to my action, you respond to what you can prove about my action, so it’s really that’s important.
    
    But I think that even in single player games, we should think of the environment as containing an implementation of us which takes an action when it can prove it’s the “right” action, so even then it’s provability that has consequences.
    
    This may seem unnatural, but that’s just because considered decisions are unnatural, and mostly we just do stuff. But I think it’s a good model of thinking with a notebook open and plenty of time. That is, our model is that the consequences in the environment trace back to the existence of a proof in Peano Arithmetic.
    
    My only reservation really is that I think maybe it should be , where means provable with a consistency assumption. Otherwise, how do I know that my implementation in the environment (for a single-player game) or my opponent’s simulation of me (in the Prisoner’s Dilemma) will actually find a proof of before a proof of ? Even if is provable, the implementation or simulation could get to a contradiction first.