What about the agent represented by the formula “P iff provable(P iff Q)”? It clearly cooperates with itself, and its bounded version should be able to do so with much less computing power than PrudentBot would need. What does it do against PrudentBot and FairBot? Is there anything wrong with it?
This agent is unexploitable, and mutually cooperates with PrudentBot and FairBot, but it cooperates with CooperateBot (since the fixed point reduces to “P iff Box(P)”, which makes the Löbian sentence valid).
Notably, this agent is exactly the same as FairBot. Clearly it satisfies P ⇒ Provable (Q). We can also check Provable(Q) =>P, since if provable(Q), then P iff Provable(P), so P. Thus P <==> Provable(Q), so it’s FairBot.
Notions of sameness need to be handled with care here in general. There are lots: literally identical code is the most obvious one, then provably identical behavior in PA (against some class of reasonable agents), then provably identical behavior in PA+1 (against some class of reasonable agents)… it’s sometimes necessary to use the first notion of sameness because opponents may not behave fairly, e.g. they may punish exactly one program, so in general the set of opponents you want to prove sameness with respect to also matters.
What about the agent represented by the formula “P iff provable(P iff Q)”? It clearly cooperates with itself, and its bounded version should be able to do so with much less computing power than PrudentBot would need. What does it do against PrudentBot and FairBot? Is there anything wrong with it?
This agent is unexploitable, and mutually cooperates with PrudentBot and FairBot, but it cooperates with CooperateBot (since the fixed point reduces to “P iff Box(P)”, which makes the Löbian sentence valid).
Notably, this agent is exactly the same as FairBot. Clearly it satisfies P ⇒ Provable (Q). We can also check Provable(Q) =>P, since if provable(Q), then P iff Provable(P), so P. Thus P <==> Provable(Q), so it’s FairBot.
Notions of sameness need to be handled with care here in general. There are lots: literally identical code is the most obvious one, then provably identical behavior in PA (against some class of reasonable agents), then provably identical behavior in PA+1 (against some class of reasonable agents)… it’s sometimes necessary to use the first notion of sameness because opponents may not behave fairly, e.g. they may punish exactly one program, so in general the set of opponents you want to prove sameness with respect to also matters.
I believe this argument proves identical behavior against all modal agents, and identical behavior of modal agents against.