I think I missed the indirection required to use Lob’s theorem (which I thought was about not being able to prove that a statement is unprovable, not about proving false things—that is, for our formal systems, we accept incompleteness but not incorrectness).
Mainly I don’t see where you have the actual proposition setup in your “proof”—the knowledge that you’re choosing between $5 and $10, not between $5 and “something else, which we are allowed to assume is $0″. If you don’t know (or your program ignores) that you will be offered $10 iff you reject the $5, you’ll OF COURSE take the $5. You can test this in real humans: go offer someone $5 and don’t say anything else. If they turn it down, congratulate them on their decision-making and give them $10. If they take the $5, tell them they fell victim to the 5-10 problem and laugh like a donkey.
FactorialCode’s comment _is_ an application that shows the problem—it’s clearly “among things I can prove, pick the best”, not “evaluate the decision of passing up the $5″. I’d argue that it’s _ALSO_ missing the important part of the setup—you can prove that taking $10 gives you $10 as easily as that $5 gives you $5, and the “look for proof” is handwaved (better than absent) without showing why the proof.
My (possibly very incorrect) takeaway from the post, as someone with very little background in mathematical logic, was that “If I can prove x has higher utility than y, then I will do x” (statement 1) is a bad heuristic for an EDT agent that can reason about its own source code, because outputting x will be a fixed point of this decision process* even when this does not return higher utility. Specifically, an EDT agent will choose action x iff the utility of choosing x is higher than that of choosing y (assuming the utilities are different). Thus, assuming statement 1 is equivalent to assuming “if I can prove x has higher utility than y, x has higher utility than y” (statement 2) for the EDT agent. Because assuming statement 2 leads to absurd conclusions (like the agent taking the 5 dollar bill), assuming it is a bad heuristic.
This use of Lob’s theorem seems to do exactly what you want: show that we can’t prove a statement is unprovable. If we prove a statement of the form “if a is provable then a is true” , then the contrapositive “if a is not true then it is not provable” follows. However, I thought the point of the post is that we can’t actually prove a statement of this form, namely the statement “if x does not have higher utility than y, then I cannot prove that x has higher utility than y” (statement 3). Statement 3 is necessary for the heuristic in statement 1 to be useful, but the post shows that it is in fact false.
The point of the post isn’t to prove something false, it’s to show that we can’t prove a statement is unprovable.
*I’m not sure if I’m these terms correctly and precisely :/
Yes, you could make the code more robust by allowing the agent to act once its found a proof that any action is superior. Then, it might find a proof like
U(F) = 5
U(~F) = 10
10 > 5
U(~F) > U(F)
However, there’s no guarantee that this will be the first proof it finds.
When I say “look for a proof”, I mean something like “for each of the first 10^(10^100)) Godel numbers, see if it encodes a proof. If so, return that action.
In simple cases like the one above, it likely will find the correct proof first. However, as the universe gets more complicated (as our universe is), there is a greater chance that a spurious proof will be found first.
I think I missed the indirection required to use Lob’s theorem (which I thought was about not being able to prove that a statement is unprovable, not about proving false things—that is, for our formal systems, we accept incompleteness but not incorrectness).
Mainly I don’t see where you have the actual proposition setup in your “proof”—the knowledge that you’re choosing between $5 and $10, not between $5 and “something else, which we are allowed to assume is $0″. If you don’t know (or your program ignores) that you will be offered $10 iff you reject the $5, you’ll OF COURSE take the $5. You can test this in real humans: go offer someone $5 and don’t say anything else. If they turn it down, congratulate them on their decision-making and give them $10. If they take the $5, tell them they fell victim to the 5-10 problem and laugh like a donkey.
FactorialCode’s comment _is_ an application that shows the problem—it’s clearly “among things I can prove, pick the best”, not “evaluate the decision of passing up the $5″. I’d argue that it’s _ALSO_ missing the important part of the setup—you can prove that taking $10 gives you $10 as easily as that $5 gives you $5, and the “look for proof” is handwaved (better than absent) without showing why the proof.
My (possibly very incorrect) takeaway from the post, as someone with very little background in mathematical logic, was that “If I can prove x has higher utility than y, then I will do x” (statement 1) is a bad heuristic for an EDT agent that can reason about its own source code, because outputting x will be a fixed point of this decision process* even when this does not return higher utility. Specifically, an EDT agent will choose action x iff the utility of choosing x is higher than that of choosing y (assuming the utilities are different). Thus, assuming statement 1 is equivalent to assuming “if I can prove x has higher utility than y, x has higher utility than y” (statement 2) for the EDT agent. Because assuming statement 2 leads to absurd conclusions (like the agent taking the 5 dollar bill), assuming it is a bad heuristic.
This use of Lob’s theorem seems to do exactly what you want: show that we can’t prove a statement is unprovable. If we prove a statement of the form “if a is provable then a is true” , then the contrapositive “if a is not true then it is not provable” follows. However, I thought the point of the post is that we can’t actually prove a statement of this form, namely the statement “if x does not have higher utility than y, then I cannot prove that x has higher utility than y” (statement 3). Statement 3 is necessary for the heuristic in statement 1 to be useful, but the post shows that it is in fact false.
The point of the post isn’t to prove something false, it’s to show that we can’t prove a statement is unprovable.
*I’m not sure if I’m these terms correctly and precisely :/
Yes, you could make the code more robust by allowing the agent to act once its found a proof that any action is superior. Then, it might find a proof like
U(F) = 5
U(~F) = 10
10 > 5
U(~F) > U(F)
However, there’s no guarantee that this will be the first proof it finds.
When I say “look for a proof”, I mean something like “for each of the first 10^(10^100)) Godel numbers, see if it encodes a proof. If so, return that action.
In simple cases like the one above, it likely will find the correct proof first. However, as the universe gets more complicated (as our universe is), there is a greater chance that a spurious proof will be found first.