The 5-10 Problem is a strange issue in which an agent reasoning about itself makes an obviously wrong choice.
Our agent faces a truly harrowing choice: it must decide between taking $5 (utility 5) or $10 (utility 10).
How will our agent solve this dilemna? First, it will spend some time looking for a proof that taking $5 is better than taking $10. If it can find one, it will take the $5. Otherwise, it will take the $10.
Fair enough, you think. Surely the agent will concede that it can’t prove taking $5 is better than taking $10. Then, it will do the sensible thing and take the $10, right?
Wrong.
Our agent finds the following the following proof that taking $5 is better:
Line 6: Knowing that taking $5 gives you $5 and taking $10 gives you $0, you happily take the $5.
Simplified Example
To understand what went wrong, we’ll consider a simpler example. Suppose you have a choice between drinking coffee (utility 1) and killing yourself (utility −100).
You decide to use the following algorithm: “if I can prove that I will kill myself, then I’ll kill myself. Otherwise, I’ll drink coffee”.
And because a proof that you’ll kill yourself, implies that you’ll kill yourself, by Lob’s Theorem, you will kill yourself.
Here, it is easier to see what went wrong-proving that you’ll kill yourself is not a good reason to kill yourself.
This is hidden in the original 5-10 problem. The first conditional is equivalent to “if I can prove I will take $5, then I’ll take $5”.
Hopefully, it’s now more clear what went wrong. How can we fix it?
Solution?
I once saw a comment suggesting that the agent instead reason about how a similar agent would act (I can’t find it anymore, sorry). However, this notion was not formalized. I propose the following formalization:
We construct an agent A. Each time A makes a decision, it increments an internal counter n, giving each decision a unique identity. A uses the following procedure to make decisions: for each action a, it considers the agent Aa,n. Aa,n is a copy of A (from when it was created), except that if Aa,n would make a decision with id n, it instead immediately takes action a. Then, if A can prove any of these agents has the maximum expected utility, it chooses the action corresponding to that agent.
Thoughts on the 5-10 Problem
5 dollars is better than 10 dollars
The 5-10 Problem is a strange issue in which an agent reasoning about itself makes an obviously wrong choice.
Our agent faces a truly harrowing choice: it must decide between taking $5 (utility 5) or $10 (utility 10).
How will our agent solve this dilemna? First, it will spend some time looking for a proof that taking $5 is better than taking $10. If it can find one, it will take the $5. Otherwise, it will take the $10.
Fair enough, you think. Surely the agent will concede that it can’t prove taking $5 is better than taking $10. Then, it will do the sensible thing and take the $10, right?
Wrong.
Our agent finds the following the following proof that taking $5 is better:
Let’s go over the proof.
Line 1: Taking $5 gives you $5.
Line 2: If F is true, then ~F->x is true for any x.
Line 3: If you find a proof that taking $5 gives you $5 and take $10 gives you $0, you’d take the $5.
Line 4: Combine the three previous lines
Line 5: Löb’s Theorem
Line 6: Knowing that taking $5 gives you $5 and taking $10 gives you $0, you happily take the $5.
Simplified Example
To understand what went wrong, we’ll consider a simpler example. Suppose you have a choice between drinking coffee (utility 1) and killing yourself (utility −100).
You decide to use the following algorithm: “if I can prove that I will kill myself, then I’ll kill myself. Otherwise, I’ll drink coffee”.
And because a proof that you’ll kill yourself, implies that you’ll kill yourself, by Lob’s Theorem, you will kill yourself.
Here, it is easier to see what went wrong-proving that you’ll kill yourself is not a good reason to kill yourself.
This is hidden in the original 5-10 problem. The first conditional is equivalent to “if I can prove I will take $5, then I’ll take $5”.
Hopefully, it’s now more clear what went wrong. How can we fix it?
Solution?
I once saw a comment suggesting that the agent instead reason about how a similar agent would act (I can’t find it anymore, sorry). However, this notion was not formalized. I propose the following formalization:
We construct an agent A. Each time A makes a decision, it increments an internal counter n, giving each decision a unique identity. A uses the following procedure to make decisions: for each action a, it considers the agent Aa,n. Aa,n is a copy of A (from when it was created), except that if Aa,n would make a decision with id n, it instead immediately takes action a. Then, if A can prove any of these agents has the maximum expected utility, it chooses the action corresponding to that agent.