(EDIT: focusing here on PrudentBot. Fwiw I like the idea you have in SharkBot use the weaker diamond when evaluating whether to defect! I’m less well equipped to analyze it at this time, still just grappling with how to handle diamonds at all in my usual proof search ontology.)
I’m less familiar with the diamond modality; but if I can correctly translate “exists some world that satisfies X” as “not every world satisfies not-X”, we get the following:
This will run into some trouble in the unbounded-proof-length provability logic model; since generally you can’t prove in some fixed length that there will be no proofs of any length of , on pain of unsoundness via a Lobian fixed point. So the version in the paper runs (I believe) as follows:
This is handling the part where might be FairBot and require “proving that there are no proofs” by checking the defection part under the assumption of one level of soundness. (I remember this gotcha because when I try to write down the PrudentBot that lives in my heart I end up with something closer to your version, but then the paper was doing this carefully different thing...)
Insofar as it’s really the proof lengths fighting that is the problem, I think something like the following might just work instead:
Anyway this is all to say that:
I agree that the original PrudentBot definition seems hacky, both in the DefectBot aspect (we can replace this with “doesn’t cooperate with every bot” if that helps), and also in the assuming exactly one level of soundness regard. It is ofc a nice POC given that it does manage to cooperate with itself with this machinery.
In terms of caching the diamond definitions out into provability logic some care seems to be required, and I’m interested in a well-behaved translation. The “not box not” translation doesn’t actually stay very faithful to the intent.
It might be that messing around with proof bounds is sufficient to get there, so you don’t end up with an inner proof length being able to diagonalize an outer proof length.
If I’m the Payorian FairBot and you’re my opponent, I simulate you under the assumption that there’s a proof I cooperate; this is .
So, to stay in the Payorian spirit, I could also simulate you under the assumption that there’s a proof I defect.
That makes me think that a Payorian PrudentBot would satisfy
Since there’s no third program involved, I can just define and to clean this up a bit:
The reason this looks right to me is because when we start drawing Kripke frames it does look like I (PrudentBot) am doing two simulations of you (my opponent), to see what you do assuming my provable cooperation and defection.
Let’s assume .
Then (and yes, is ),
I’ll cooperate if this can’t be satisfied, meaning we need to show contradictions in two different sets of requirements on the Kripke model. The first is the one from my post:
That’s the one where I simulate you assuming there’s a proof that I cooperate.
We rule this out if, in that world, you cooperate (contradicting the in ).
The other:
This is the one where I simulate you assuming there’s a proof I defect.
We rule this one out if, in that world, you defect (contradicting the in ).
So, if you cooperate in the simulation where there’s a proof I cooperate, and defect in the simulation where there’s a proof I defect, we rule these both out.
That means we can’t satisfy the conditions imposed by my defection, meaning I cooperate.
That sounds prudent.
As for the Löbian PrudentBot testing its opponent on DefectBot and how that seems arbitrary.
I mentioned in the FairBot vs FairBot section of my post that looks suspiciously like I’m simulating you facing CooperateBot.
This was troubling to me because I shouldn’t care whether my opponent exploits CooperateBot.
Similarly, in this proposed Payorian PrudentBot, looks suspiciously like simulating my opponent against DefectBot.
However, I think it’s not really?
Although that brings to mind something missing from my post.
In my post, when I’m simulating you, I never actually give you the full description of myself.
I think a general version of this method requires an additional premise, repeating the description of myself but wrapped in a provability modality so it’s accessible in the simulations.
Then my simulated opponent in doesn’t see a CooperateBot; rather, it sees whatever bot I am, but we assume there’s a proof I cooperate.
Well… I hope this is different than just running my opponent against a CooperateBot, but I’m worried.
I’m not really sure how any of this will turn out, and I don’t have time to try it on paper right now or learn to use any of the automated tools.
But in any case I think it’s good to record what seems like the obvious Payorian PrudentBot, at least to me, even if, as before, the solution ends up non-obvious.
I was talking with Joseph, and I think I like his SharkBot more because it fails more gracefully. Suppose (1) “proof” has an upper bound in computation cycles, and (2) people occasionally make mistakes in their logic. A good prover might spend more computation cycles in error correction. What happens if they do not have enough time to prove cooperation leads to cooperation, or defection leads to defection?
If Joseph’s bot stalls out on the second half of its computation, it concludes, “I couldn’t prove they would cooperate if I’m caught (provably) defecting,” and cooperates. If your bot stalls out on the second half of its computation, it concludes, “I couldn’t prove being caught (provably) defecting would lead to them also defecting,” and thinks it can get away with defection.
Another way of putting it: dumber bots are more likely to think they can get away with defection when they really can’t, and defect against bots smarter than them. If a bot is going to try to take advantage of rocks[1], it better well make sure it is actually playing against a rock, and not just making a stupid mistake that hurts everyone.
Also, as an aside, I think making mistakes (logical bit flips some percent of the time) naturally penalizes high-complexity policies. This is why you might expect societies to begin with mostly cooperate/defect bots, then transition to citizen/police bots, and slowly build complexity where each individual’s policy is relatively simple, but society as a whole gets more complex interactions. I think this would be an interesting area of research.
Thanks, this is a solid point that choosing the defect-cooperate outcome should really be based on constructive knowledge, and the fallback shouldn’t be “definitely defect!” if you can’t obtain that.
So that makes me more think that is the sort of term that is legit in a PrudentBot, since “we can’t find a proof that if we provably defect then they cooperate” is what we actually wanted. And sure the naive translation into provability logic won’t like this, and I think this means we should look for a nicer translation, probably starting with getting a bounded proof-search bot correct.
I’m wondering about what the SharkBot term is doing, compared to my (probably broken) idea of using . If I converting this one to diamonds I get:
(“it’s possible that I necessarily defect while my opponent defects”)
(“it’s possible that my opponent defects while it is not possible that I cooperate”)
This seems potentially more restrictive than it needs to be? Well, restrictive can be good here, since what we are targeting is defecting as often as we can while guaranteeing a defect-cooperate outcome. But also as pointed out elsethread this PrudentBot approach may have trouble cooperating with itself.
Anyway this leaves me more ready to unpack the SharkBot condition, which is . This is a bit different than what I landed at, reading “it’s possible that my opponent defects or it is possible that I cooperate”.
Converting the SharkBot condition back to squares I get:
Ah okay, so I think this is caching out the version that’s like “I defect if it is stable that my opponent cooperates while I provably defect”. (I think this may be equivalent in strength to my condition under Lob’s theorem?)
Btw on the theory that “” was a bit of a mistake relative to “” as a basic approach, we could simplify a bunch of the language above to “it’s possible that my opponent defects while I defect” (my one) or “it’s possible that my opponent defects or I defect” (SharkBot). I’m taking this as some evidence that it would be better to go with the forms.
This gets at the deeper issue: how do we represent an action having a consequence?
I don’t think it’s .
In the Prisoner’s Dilemma tournament, you don’t respond to my action, you respond to what you can prove about my action, so it’s really that’s important.
But I think that even in single player games, we should think of the environment as containing an implementation of us which takes an action when it can prove it’s the “right” action, so even then it’s provability that has consequences.
This may seem unnatural, but that’s just because considered decisions are unnatural, and mostly we just do stuff.
But I think it’s a good model of thinking with a notebook open and plenty of time.
That is, our model is that the consequences in the environment trace back to the existence of a proof in Peano Arithmetic.
My only reservation really is that I think maybe it should be , where means provable with a consistency assumption.
Otherwise, how do I know that my implementation in the environment (for a single-player game) or my opponent’s simulation of me (in the Prisoner’s Dilemma) will actually find a proof of before a proof of ?
Even if is provable, the implementation or simulation could get to a contradiction first.
(Thanks for the post btw! My comments on this comment below.)
This PrudentBot def does feel in-spirit to me. I also agree with your analysis that it doesn’t fall for the pitfall of “I assumed in my hypothetical that you thought I would unconditionally cooperate with you, but then no fair you defected on me in my hypothetical! I defect!”
For instance this happens if you try, as I was inclined to try, . This quickly becomes “false” i.e. “defect” if you have a lying around.
With your definition, we have , and assuming we have a around I think that simplifies to . (I’m not sure I have that right!)
That’s… better behaved! It will run into the same Lobian difficulties as usual if we directly translate this as a provability bot. So imo we still need some better answer as to how we should interpret these as programs that more faithfully represents what we want in terms of their reasoning.
Also fwiw, I run with the model that the simulated opponent has access to both a description of your full behaviour and a shortcut proof that you cooperate with just them. (In the cooperation branch hypothetical.) So when your simulations have the cooperation proof in scope, you’re just saying that the behaviour is equivalent to CooperateBot in this very matchup. And it happens that this is often a sufficient condition for cooperation. (And if your opponent runs you on a different bot like DefectBot, the cooperation proof doesn’t apply there.)
So there’s some question here like “well what’s the point of simulating under the assumption of the narrow cooperation proof if we’re keeping the full description as a fallback?”
Part of what your post points at is an answer which is that, well, it works out more cleanly if you do! Specifically that reasoning is easier if, for the purposes of deciding to cooperate, you assume the decision is made and check that it’s stable/good.
As for why that is the case, I have a more philosophical take that it’s about encoding choice, and the better mechanics we see are downstream of encoding this better. I’ll go into more detail in this collapsed section (you may well wish to gather your own thoughts first).
on encoding choice
We can think of the bot as containing some mechanisms that lead to certain actions, with conditions for firing.
The FairBot variant described in this post is a simple instance, with a default action of defection and a single mechanism that leads to coooperation when it fires.
I have some idea that the choice-y mechanisms are of the form “if I activate, will the outcome satisfy some property? if so I activate”.
So for a given mechanism we can have a widget with a goal target . We want a way of saying “if were to fire this would lead to ” as our condition for . And there’s a question of how to encode this in a program or in our modal logic setting.
Straight up trying won’t work, since and you won’t be able to form a statement like this. This was omitting the part where is using some model of consequences to make its choice.
So next up is , in which fires if our modelling shows that if fires this will lead to . You can in fact show that here!
The other option is ; this doesn’t assume ” fires” but rather “someone has a model that fires”, and is actually a bit more brittle. The reason this form can be preferable in the modal logic bots is that you are sidestepping the complexity of showing that you will need to check your opponent can tell that has fired. But I think this may be a bit of a hack relative to the version.
My reasoning here suggests for that if you have multiple conditions for action, you may factor out // that are each self-referential, and build your high level agent as trying each in sequence. I haven’t fully cached out yet what this looks like, and whether it e.g. gives you a meaningfully different PrudentBot.
I note that this picture doesn’t provide a convenient answer for the problem where in provability land your action “defect” may cause someone to need to prove . But as I guess I keep mentioning I view this as a bug in the provability encoding.
(EDIT: focusing here on PrudentBot. Fwiw I like the idea you have in SharkBot use the weaker diamond when evaluating whether to defect! I’m less well equipped to analyze it at this time, still just grappling with how to handle diamonds at all in my usual proof search ontology.)
I’m less familiar with the diamond modality; but if I can correctly translate “exists some world that satisfies X” as “not every world satisfies not-X”, we get the following:
This will run into some trouble in the unbounded-proof-length provability logic model; since generally you can’t prove in some fixed length that there will be no proofs of any length of , on pain of unsoundness via a Lobian fixed point. So the version in the paper runs (I believe) as follows:
This is handling the part where might be FairBot and require “proving that there are no proofs” by checking the defection part under the assumption of one level of soundness. (I remember this gotcha because when I try to write down the PrudentBot that lives in my heart I end up with something closer to your version, but then the paper was doing this carefully different thing...)
Insofar as it’s really the proof lengths fighting that is the problem, I think something like the following might just work instead:
Anyway this is all to say that:
I agree that the original PrudentBot definition seems hacky, both in the DefectBot aspect (we can replace this with “doesn’t cooperate with every bot” if that helps), and also in the assuming exactly one level of soundness regard. It is ofc a nice POC given that it does manage to cooperate with itself with this machinery.
In terms of caching the diamond definitions out into provability logic some care seems to be required, and I’m interested in a well-behaved translation. The “not box not” translation doesn’t actually stay very faithful to the intent.
It might be that messing around with proof bounds is sufficient to get there, so you don’t end up with an inner proof length being able to diagonalize an outer proof length.
If I’m the Payorian FairBot and you’re my opponent, I simulate you under the assumption that there’s a proof I cooperate; this is .
So, to stay in the Payorian spirit, I could also simulate you under the assumption that there’s a proof I defect.
That makes me think that a Payorian PrudentBot would satisfy
Since there’s no third program involved, I can just define and to clean this up a bit:
The reason this looks right to me is because when we start drawing Kripke frames it does look like I (PrudentBot) am doing two simulations of you (my opponent), to see what you do assuming my provable cooperation and defection. Let’s assume .
Then (and yes, is ),
I’ll cooperate if this can’t be satisfied, meaning we need to show contradictions in two different sets of requirements on the Kripke model. The first is the one from my post:
That’s the one where I simulate you assuming there’s a proof that I cooperate. We rule this out if, in that world, you cooperate (contradicting the in ).
The other:
This is the one where I simulate you assuming there’s a proof I defect. We rule this one out if, in that world, you defect (contradicting the in ).
So, if you cooperate in the simulation where there’s a proof I cooperate, and defect in the simulation where there’s a proof I defect, we rule these both out. That means we can’t satisfy the conditions imposed by my defection, meaning I cooperate.
That sounds prudent.
As for the Löbian PrudentBot testing its opponent on DefectBot and how that seems arbitrary. I mentioned in the FairBot vs FairBot section of my post that looks suspiciously like I’m simulating you facing CooperateBot.
This was troubling to me because I shouldn’t care whether my opponent exploits CooperateBot.
Similarly, in this proposed Payorian PrudentBot, looks suspiciously like simulating my opponent against DefectBot.
However, I think it’s not really? Although that brings to mind something missing from my post. In my post, when I’m simulating you, I never actually give you the full description of myself. I think a general version of this method requires an additional premise, repeating the description of myself but wrapped in a provability modality so it’s accessible in the simulations. Then my simulated opponent in doesn’t see a CooperateBot; rather, it sees whatever bot I am, but we assume there’s a proof I cooperate.
Well… I hope this is different than just running my opponent against a CooperateBot, but I’m worried.
I’m not really sure how any of this will turn out, and I don’t have time to try it on paper right now or learn to use any of the automated tools. But in any case I think it’s good to record what seems like the obvious Payorian PrudentBot, at least to me, even if, as before, the solution ends up non-obvious.
I was talking with Joseph, and I think I like his SharkBot more because it fails more gracefully. Suppose (1) “proof” has an upper bound in computation cycles, and (2) people occasionally make mistakes in their logic. A good prover might spend more computation cycles in error correction. What happens if they do not have enough time to prove cooperation leads to cooperation, or defection leads to defection?
If Joseph’s bot stalls out on the second half of its computation, it concludes, “I couldn’t prove they would cooperate if I’m caught (provably) defecting,” and cooperates. If your bot stalls out on the second half of its computation, it concludes, “I couldn’t prove being caught (provably) defecting would lead to them also defecting,” and thinks it can get away with defection.
Another way of putting it: dumber bots are more likely to think they can get away with defection when they really can’t, and defect against bots smarter than them. If a bot is going to try to take advantage of rocks [1] , it better well make sure it is actually playing against a rock, and not just making a stupid mistake that hurts everyone.
Also, as an aside, I think making mistakes (logical bit flips some percent of the time) naturally penalizes high-complexity policies. This is why you might expect societies to begin with mostly cooperate/defect bots, then transition to citizen/police bots, and slowly build complexity where each individual’s policy is relatively simple, but society as a whole gets more complex interactions. I think this would be an interesting area of research.
From the phrase, “how do you play a Prisoner’s Dilemma against a rock?” Rocks are bots that cooperate even if it is proven you are going to defect.
Thanks, this is a solid point that choosing the defect-cooperate outcome should really be based on constructive knowledge, and the fallback shouldn’t be “definitely defect!” if you can’t obtain that.
So that makes me more think that is the sort of term that is legit in a PrudentBot, since “we can’t find a proof that if we provably defect then they cooperate” is what we actually wanted. And sure the naive translation into provability logic won’t like this, and I think this means we should look for a nicer translation, probably starting with getting a bounded proof-search bot correct.
I’m wondering about what the SharkBot term is doing, compared to my (probably broken) idea of using . If I converting this one to diamonds I get:
This seems potentially more restrictive than it needs to be? Well, restrictive can be good here, since what we are targeting is defecting as often as we can while guaranteeing a defect-cooperate outcome. But also as pointed out elsethread this PrudentBot approach may have trouble cooperating with itself.
Anyway this leaves me more ready to unpack the SharkBot condition, which is . This is a bit different than what I landed at, reading “it’s possible that my opponent defects or it is possible that I cooperate”.
Converting the SharkBot condition back to squares I get:
Ah okay, so I think this is caching out the version that’s like “I defect if it is stable that my opponent cooperates while I provably defect”. (I think this may be equivalent in strength to my condition under Lob’s theorem?)
Btw on the theory that “ ” was a bit of a mistake relative to “ ” as a basic approach, we could simplify a bunch of the language above to “it’s possible that my opponent defects while I defect” (my one) or “it’s possible that my opponent defects or I defect” (SharkBot). I’m taking this as some evidence that it would be better to go with the forms.
I don’t think was a mistake.
This gets at the deeper issue: how do we represent an action having a consequence? I don’t think it’s .
In the Prisoner’s Dilemma tournament, you don’t respond to my action, you respond to what you can prove about my action, so it’s really that’s important.
But I think that even in single player games, we should think of the environment as containing an implementation of us which takes an action when it can prove it’s the “right” action, so even then it’s provability that has consequences.
This may seem unnatural, but that’s just because considered decisions are unnatural, and mostly we just do stuff. But I think it’s a good model of thinking with a notebook open and plenty of time. That is, our model is that the consequences in the environment trace back to the existence of a proof in Peano Arithmetic.
My only reservation really is that I think maybe it should be , where means provable with a consistency assumption.
Otherwise, how do I know that my implementation in the environment (for a single-player game) or my opponent’s simulation of me (in the Prisoner’s Dilemma) will actually find a proof of before a proof of ?
Even if is provable, the implementation or simulation could get to a contradiction first.
(Thanks for the post btw! My comments on this comment below.)
This PrudentBot def does feel in-spirit to me. I also agree with your analysis that it doesn’t fall for the pitfall of “I assumed in my hypothetical that you thought I would unconditionally cooperate with you, but then no fair you defected on me in my hypothetical! I defect!”
For instance this happens if you try, as I was inclined to try, . This quickly becomes “false” i.e. “defect” if you have a lying around.
With your definition, we have , and assuming we have a around I think that simplifies to . (I’m not sure I have that right!)
That’s… better behaved! It will run into the same Lobian difficulties as usual if we directly translate this as a provability bot. So imo we still need some better answer as to how we should interpret these as programs that more faithfully represents what we want in terms of their reasoning.
Also fwiw, I run with the model that the simulated opponent has access to both a description of your full behaviour and a shortcut proof that you cooperate with just them. (In the cooperation branch hypothetical.) So when your simulations have the cooperation proof in scope, you’re just saying that the behaviour is equivalent to CooperateBot in this very matchup. And it happens that this is often a sufficient condition for cooperation. (And if your opponent runs you on a different bot like DefectBot, the cooperation proof doesn’t apply there.)
So there’s some question here like “well what’s the point of simulating under the assumption of the narrow cooperation proof if we’re keeping the full description as a fallback?”
Part of what your post points at is an answer which is that, well, it works out more cleanly if you do! Specifically that reasoning is easier if, for the purposes of deciding to cooperate, you assume the decision is made and check that it’s stable/good.
As for why that is the case, I have a more philosophical take that it’s about encoding choice, and the better mechanics we see are downstream of encoding this better. I’ll go into more detail in this collapsed section (you may well wish to gather your own thoughts first).
on encoding choice
We can think of the bot as containing some mechanisms that lead to certain actions, with conditions for firing.
The FairBot variant described in this post is a simple instance, with a default action of defection and a single mechanism that leads to coooperation when it fires.
I have some idea that the choice-y mechanisms are of the form “if I activate, will the outcome satisfy some property? if so I activate”.
So for a given mechanism we can have a widget with a goal target . We want a way of saying “if were to fire this would lead to ” as our condition for . And there’s a question of how to encode this in a program or in our modal logic setting.
Straight up trying won’t work, since and you won’t be able to form a statement like this. This was omitting the part where is using some model of consequences to make its choice.
So next up is , in which fires if our modelling shows that if fires this will lead to . You can in fact show that here!
The other option is ; this doesn’t assume ” fires” but rather “someone has a model that fires”, and is actually a bit more brittle. The reason this form can be preferable in the modal logic bots is that you are sidestepping the complexity of showing that you will need to check your opponent can tell that has fired. But I think this may be a bit of a hack relative to the version.
My reasoning here suggests for that if you have multiple conditions for action, you may factor out // that are each self-referential, and build your high level agent as trying each in sequence. I haven’t fully cached out yet what this looks like, and whether it e.g. gives you a meaningfully different PrudentBot.
I note that this picture doesn’t provide a convenient answer for the problem where in provability land your action “defect” may cause someone to need to prove . But as I guess I keep mentioning I view this as a bug in the provability encoding.