There was a math paper which tried to study logical causation, and claimed “we can imbue the impossible worlds with a sufficiently rich structure so that there are all kinds of inconsistent mathematical structures (which are more or less inconsistent, depending on how many contradictions they feature).”
In the end, they didn’t find a way to formalize logical causality, and I suspect it cannot be formalized.
Logical counterfactuals behave badly because “deductive explosion” allows a single contradiction to prove and disprove every possible statement!
However, “deductive explosion” does not occur for a UDT agent trying to reason about logical counterfactuals where he outputs something different than what he actually outputs.
This is because a computation cannot prove its own output.
Why a computation cannot prove its own output
If a computation could prove its own output, it could be programmed to output the opposite of what it proves it will output, which is paradoxical.
This paradox doesn’t occur because a computation trying to prove its own output (and give the opposite output) will have to simulate itself. The simulation of itself starts another nested simulation of itself, creating an infinite recursion which never ends (the computation crashes before it can give any output).
A computation’s output is logically downstream of it. The computation is not allowed to prove logical facts downstream from itself but it is allowed to decide logical facts downstream of itself.
Therefore, very conveniently (and elegantly?), it avoids the “deductive explosion” problem.
It’s almost as if… logic… deliberately conspired to make UDT feasible...?!
Yeah, from the claim that pi starts with two you can easily prove anything. But I think:
(1) something like logical induction should somewhat help: maybe the agent doesn’t know whether some statement is true and isn’t going to run for long enough to start encounter contradictions.
(2) Omega can also maybe intervene on the agent’s experience/knowledge of more accessible logical statements while leaving other things intact, sort of like making you experience what Eliezer describes here as convincing that 2+2=3: https://www.lesswrong.com/posts/6FmqiAgS8h4EJm86s/how-to-convince-me-that-2-2-3, and if that’s what it is doing, we should basically ignore our knowledge of maths for the purpose of thinking about logical counterfactuals.
I was thinking that deductive explosion occurs for logical counterfactuals encountered during counterfactual mugging, but doesn’t occur for logical counterfactuals encountered when a UDT agent merely considers what would happen if it outputs something else (as a logical computation).
I agree that logical counterfactual mugging can work, just that it probably can’t be formalized, and may have an inevitable degree of subjectivity to it.
Coincidentally, just a few days ago I wrote a post on how we can use logical counterfactual mugging to convince a misaligned superintelligence to give humans just a little, even if it observes the logical information that humans lose control every time (and therefore has nothing to trade with it), unless math and logic itself was different. :) leave a comment there if you have time, in my opinion it’s more interesting and concrete.
This paradox doesn’t occur because a computation trying to prove its own output (and give the opposite output) will have to simulate itself
Due to Löb, if a computation knows that if it finds a proof that it outputs A, then it will output A, then it proves that it outputs A, without any need for recursion. This is why you really shouldn’t output something just because you’ve proved that you will.
There was a math paper which tried to study logical causation, and claimed “we can imbue the impossible worlds with a sufficiently rich structure so that there are all kinds of inconsistent mathematical structures (which are more or less inconsistent, depending on how many contradictions they feature).”
In the end, they didn’t find a way to formalize logical causality, and I suspect it cannot be formalized.
Logical counterfactuals behave badly because “deductive explosion” allows a single contradiction to prove and disprove every possible statement!
However, “deductive explosion” does not occur for a UDT agent trying to reason about logical counterfactuals where he outputs something different than what he actually outputs.
This is because a computation cannot prove its own output.
Why a computation cannot prove its own output
If a computation could prove its own output, it could be programmed to output the opposite of what it proves it will output, which is paradoxical.
This paradox doesn’t occur because a computation trying to prove its own output (and give the opposite output) will have to simulate itself. The simulation of itself starts another nested simulation of itself, creating an infinite recursion which never ends (the computation crashes before it can give any output).
A computation’s output is logically downstream of it. The computation is not allowed to prove logical facts downstream from itself but it is allowed to decide logical facts downstream of itself.
Therefore, very conveniently (and elegantly?), it avoids the “deductive explosion” problem.
It’s almost as if… logic… deliberately conspired to make UDT feasible...?!
Yeah, from the claim that pi starts with two you can easily prove anything. But I think:
(1) something like logical induction should somewhat help: maybe the agent doesn’t know whether some statement is true and isn’t going to run for long enough to start encounter contradictions.
(2) Omega can also maybe intervene on the agent’s experience/knowledge of more accessible logical statements while leaving other things intact, sort of like making you experience what Eliezer describes here as convincing that 2+2=3: https://www.lesswrong.com/posts/6FmqiAgS8h4EJm86s/how-to-convince-me-that-2-2-3, and if that’s what it is doing, we should basically ignore our knowledge of maths for the purpose of thinking about logical counterfactuals.
I was thinking that deductive explosion occurs for logical counterfactuals encountered during counterfactual mugging, but doesn’t occur for logical counterfactuals encountered when a UDT agent merely considers what would happen if it outputs something else (as a logical computation).
I agree that logical counterfactual mugging can work, just that it probably can’t be formalized, and may have an inevitable degree of subjectivity to it.
Coincidentally, just a few days ago I wrote a post on how we can use logical counterfactual mugging to convince a misaligned superintelligence to give humans just a little, even if it observes the logical information that humans lose control every time (and therefore has nothing to trade with it), unless math and logic itself was different. :) leave a comment there if you have time, in my opinion it’s more interesting and concrete.
(MIRI did some work on logical induction.)
I’ll give the post a read!
Due to Löb, if a computation knows that if it finds a proof that it outputs A, then it will output A, then it proves that it outputs A, without any need for recursion. This is why you really shouldn’t output something just because you’ve proved that you will.