Thoughts on the 5-10 Problem

Tofly18 Jul 2019 18:56 UTC

18 points

5 dollars is better than 10 dollars

The 5-10 Problem is a strange issue in which an agent reasoning about itself makes an obviously wrong choice.

Our agent faces a truly harrowing choice: it must decide between taking $5 (utility 5) or $10 (utility 10).

How will our agent solve this dilemna? First, it will spend some time looking for a proof that taking $5 is better than taking $10. If it can find one, it will take the $5. Otherwise, it will take the $10.

Fair enough, you think. Surely the agent will concede that it can’t prove taking $5 is better than taking $10. Then, it will do the sensible thing and take the $10, right?

Wrong.

Our agent finds the following the following proof that taking $5 is better:

F \to (U = 5) F \to (\neg F \to (U = 0)) □ ((F \to (U = 5)) \land (\neg F \to (U = 0))) \to F □ ((F \to (U = 5)) \land (\neg F \to (U = 0))) \to ((F \to (U = 5)) \land (\neg F \to (U = 0))) (F \to (U = 5)) \land (\neg F \to (U = 0)) F

Let’s go over the proof.

Line 1: Taking $5 gives you $5.

Line 2: If F is true, then ~F->x is true for any x.

Line 3: If you find a proof that taking $5 gives you $5 and take $10 gives you $0, you’d take the $5.

Line 4: Combine the three previous lines

Line 5: Löb’s Theorem

Line 6: Knowing that taking $5 gives you $5 and taking $10 gives you $0, you happily take the $5.

Simplified Example

To understand what went wrong, we’ll consider a simpler example. Suppose you have a choice between drinking coffee (utility 1) and killing yourself (utility −100).

You decide to use the following algorithm: “if I can prove that I will kill myself, then I’ll kill myself. Otherwise, I’ll drink coffee”.

And because a proof that you’ll kill yourself, implies that you’ll kill yourself, by Lob’s Theorem, you will kill yourself.

Here, it is easier to see what went wrong-proving that you’ll kill yourself is not a good reason to kill yourself.

This is hidden in the original 5-10 problem. The first conditional is equivalent to “if I can prove I will take $5, then I’ll take $5”.

Hopefully, it’s now more clear what went wrong. How can we fix it?

Solution?

I once saw a comment suggesting that the agent instead reason about how a similar agent would act (I can’t find it anymore, sorry). However, this notion was not formalized. I propose the following formalization:

We construct an agent $A$ . Each time $A$ makes a decision, it increments an internal counter $n$ , giving each decision a unique identity. $A$ uses the following procedure to make decisions: for each action $a$ , it considers the agent $A_{a, n}$ . $A_{a, n}$ is a copy of $A$ (from when it was created), except that if $A_{a, n}$ would make a decision with id $n$ , it instead immediately takes action $a$ . Then, if $A$ can prove any of these agents has the maximum expected utility, it chooses the action corresponding to that agent.

Tofly18 Jul 2019 18:56 UTC

18 points

11 comments1 min readLW link

Robust Agents Decision theory

Dagon 18 Jul 2019 21:32 UTC
6 points
0
I think I missed the indirection required to use Lob’s theorem (which I thought was about not being able to prove that a statement is unprovable, not about proving false things—that is, for our formal systems, we accept incompleteness but not incorrectness).
Mainly I don’t see where you have the actual proposition setup in your “proof”—the knowledge that you’re choosing between $5 and $10, not between $5 and “something else, which we are allowed to assume is $0″. If you don’t know (or your program ignores) that you will be offered $10 iff you reject the $5, you’ll OF COURSE take the $5. You can test this in real humans: go offer someone $5 and don’t say anything else. If they turn it down, congratulate them on their decision-making and give them $10. If they take the $5, tell them they fell victim to the 5-10 problem and laugh like a donkey.
FactorialCode’s comment _is_ an application that shows the problem—it’s clearly “among things I can prove, pick the best”, not “evaluate the decision of passing up the $5″. I’d argue that it’s _ALSO_ missing the important part of the setup—you can prove that taking $10 gives you $10 as easily as that $5 gives you $5, and the “look for proof” is handwaved (better than absent) without showing why the proof.
- Liam Donovan 4 Aug 2019 0:36 UTC
  1 point
  0
  Parent
  My (possibly very incorrect) takeaway from the post, as someone with very little background in mathematical logic, was that “If I can prove x has higher utility than y, then I will do x” (statement 1) is a bad heuristic for an EDT agent that can reason about its own source code, because outputting x will be a fixed point of this decision process* even when this does not return higher utility. Specifically, an EDT agent will choose action x iff the utility of choosing x is higher than that of choosing y (assuming the utilities are different). Thus, assuming statement 1 is equivalent to assuming “if I can prove x has higher utility than y, x has higher utility than y” (statement 2) for the EDT agent. Because assuming statement 2 leads to absurd conclusions (like the agent taking the 5 dollar bill), assuming it is a bad heuristic.
  This use of Lob’s theorem seems to do exactly what you want: show that we can’t prove a statement is unprovable. If we prove a statement of the form “if a is provable then a is true” , then the contrapositive “if a is not true then it is not provable” follows. However, I thought the point of the post is that we can’t actually prove a statement of this form, namely the statement “if x does not have higher utility than y, then I cannot prove that x has higher utility than y” (statement 3). Statement 3 is necessary for the heuristic in statement 1 to be useful, but the post shows that it is in fact false.
  The point of the post isn’t to prove something false, it’s to show that we can’t prove a statement is unprovable.
  
  *I’m not sure if I’m these terms correctly and precisely :/
- Tofly 19 Jul 2019 16:37 UTC
  1 point
  0
  Parent
  Yes, you could make the code more robust by allowing the agent to act once its found a proof that any action is superior. Then, it might find a proof like
  U(F) = 5
  U(~F) = 10
  10 > 5
  U(~F) > U(F)
  However, there’s no guarantee that this will be the first proof it finds.
  When I say “look for a proof”, I mean something like “for each of the first 10^(10^100)) Godel numbers, see if it encodes a proof. If so, return that action.
  In simple cases like the one above, it likely will find the correct proof first. However, as the universe gets more complicated (as our universe is), there is a greater chance that a spurious proof will be found first.
Chris_Leong 22 Jul 2019 2:39 UTC
3 points
0
I suggested a similar approach in deconfusing logical counterfactuals where we erase information about the agent so that we end up with multiple possible agents, though I wouldn’t be surprised if other people have also tried asking their agent to ask about how other agents reason. Your approach is different in that the original agent isn’t included among the set of agents considered and that seems like a useful adaption that I hadn’t considered as long as we can provide appropriate justification. I also provide some more just
Anyway, it’s good to see someone else thinking along a similar (but different) track and I’d be curious to hear what you think about my approach.
FactorialCode 18 Jul 2019 20:08 UTC
2 points
0
If I can try and make your solution concrete for the original 5-10 problem. Would it look something like this?
```
A() := 
    let f(x) :=
         Take A() := x as an axiom instead of A() := this function
         Take U() := to be the U() in the original 5-10 problem
         Look for a proof of "U() = y"
         return y
    in 
         return argmax f(x) 
    where x in {5,10}
```
- Gurkenglas 18 Jul 2019 23:57 UTC
  2 points
  0
  Parent
  Edit: So the reason we don’t get the 5-and-10 problem is that we don’t get ☐(☐(A=5=>U=5 /\ A=10=>U=0) ⇒ (A=5=>U=5) /\ A=10=>U=0), because ☐ doesn’t have A’s source code as an axiom. Okay. (Seems like this solves the problem by reintroducing a cartesian barrier by which we can cleanly separate the decision process from all else.) (My own favorite solution A = argmax_a(max_u(☐(A=a=>U>=u))) also makes ☐ unable to predict the outcome of A’s sourcecode, because ☐ doesn’t know it won’t find a proof for A=10=>U>=6.)
  - FactorialCode 19 Jul 2019 0:52 UTC
    2 points
    0
    Parent
    How would f() map 10 to 0? Wouldn’t that require that from
    
    A() := 10 U() := if A() = 10 return 10 if A() = 5 return 5
    
    there’s a proof of
    
    U() = 0
    
    ?
    
    My understanding is that in the original formulation, the agent takes it’s own definition along with a description of the universe and looks for proofs of the form
    
    [A() = 10 -> U() = x] & [A() = 5 -> U() = y ]
    
    But since “A()” is the same in both sides of the expression, one of the implications is guaranteed to be vacuously true. So the output of the program depends on the order in which it looks for proofs. But here f looks for theorems starting from different axioms depending on it’s input, so “A()” and “U()” in f(5) can be different than “A()” and “U()” when f(10).
NothingnessAbove 19 Jul 2019 16:40 UTC
1 point
0
As far as I can tell, this problem is an exercise in logical uncertainty. Consider an example agent A which makes a decision between, say, options a and b, with possible outcomes U=0, U=5, and U=10. In general, the agent uses its logical uncertainty estimator to compare the expected utilities 0*P(U=0|a)+5*P(U=5|a)+10*P(U=10|a) to 0*P(U=0|b)+5*P(U=5|b)+10*P(U=10|b). Of course, this causes a divide by zero error if A is certain of which action it will take. To avoid this, if A ever proves that it will take an action, it will immediately take a different action, regardless of the expected utility assigned to that action. So, if A ever proves what action it takes in advance of taking it, it will be wrong, and unsound. Thus if A is sound it cannot prove in advance what action it will take. In the $5 and $10 game, A will correctly assess the expected values, and choose the $10 because it is higher. It will be able to correctly assess the expected value of the $5 because it will hold nonzero probability of taking the $5 before it makes its decision. Why will it hold this nonzero probability, when $10 is the obvious choice? Because, since by Lob’s theorem A can’t prove itself sound, and if it is unsound, it might prove in advance that it will take the $10, and therefore take the $5 instead.
I don’t know if this was clear but this is not a full answer, because logical uncertainty is hard and I’m just assuming agent A is somehow good at it.
Edit: How does A calculate the expected utility of another agent being in its position, when it is nontrivially embedded in its environment? Of course, if the agent is not embedded, the 5-10 problem ceases to be an issue(AIXI is not bothered by it), for precisely this reason: it’s easy to see what the counterfactual world where the agent decided to take another action looks like, rather than in this case where that counterfactual world might be logically impossible or ill-defined.
Liam Donovan 19 Jul 2019 7:04 UTC
1 point
0
What was wrong with specifying an agent that uses “[decision theory] unless it’s sure it’ll make a decision, then it makes the opposite choice”?
- Tofly 19 Jul 2019 16:42 UTC
  2 points
  0
  Parent
  Doesn’t that mean the agent never makes a decision?
  - Suh Dude 19 Jul 2019 20:29 UTC
    2 points
    0
    Parent
    Not really. It means that the agent never mathematically proves that it will make a decision before it makes it. The agent makes decisions by comparing expected utilities on different actions, not by predicting what it will do.