Yeah, from the claim that pi starts with two you can easily prove anything. But I think:
(1) something like logical induction should somewhat help: maybe the agent doesn’t know whether some statement is true and isn’t going to run for long enough to start encounter contradictions.
(2) Omega can also maybe intervene on the agent’s experience/knowledge of more accessible logical statements while leaving other things intact, sort of like making you experience what Eliezer describes here as convincing that 2+2=3: https://www.lesswrong.com/posts/6FmqiAgS8h4EJm86s/how-to-convince-me-that-2-2-3, and if that’s what it is doing, we should basically ignore our knowledge of maths for the purpose of thinking about logical counterfactuals.
I was thinking that deductive explosion occurs for logical counterfactuals encountered during counterfactual mugging, but doesn’t occur for logical counterfactuals encountered when a UDT agent merely considers what would happen if it outputs something else (as a logical computation).
I agree that logical counterfactual mugging can work, just that it probably can’t be formalized, and may have an inevitable degree of subjectivity to it.
Coincidentally, just a few days ago I wrote a post on how we can use logical counterfactual mugging to convince a misaligned superintelligence to give humans just a little, even if it observes the logical information that humans lose control every time (and therefore has nothing to trade with it), unless math and logic itself was different. :) leave a comment there if you have time, in my opinion it’s more interesting and concrete.
Yeah, from the claim that pi starts with two you can easily prove anything. But I think:
(1) something like logical induction should somewhat help: maybe the agent doesn’t know whether some statement is true and isn’t going to run for long enough to start encounter contradictions.
(2) Omega can also maybe intervene on the agent’s experience/knowledge of more accessible logical statements while leaving other things intact, sort of like making you experience what Eliezer describes here as convincing that 2+2=3: https://www.lesswrong.com/posts/6FmqiAgS8h4EJm86s/how-to-convince-me-that-2-2-3, and if that’s what it is doing, we should basically ignore our knowledge of maths for the purpose of thinking about logical counterfactuals.
I was thinking that deductive explosion occurs for logical counterfactuals encountered during counterfactual mugging, but doesn’t occur for logical counterfactuals encountered when a UDT agent merely considers what would happen if it outputs something else (as a logical computation).
I agree that logical counterfactual mugging can work, just that it probably can’t be formalized, and may have an inevitable degree of subjectivity to it.
Coincidentally, just a few days ago I wrote a post on how we can use logical counterfactual mugging to convince a misaligned superintelligence to give humans just a little, even if it observes the logical information that humans lose control every time (and therefore has nothing to trade with it), unless math and logic itself was different. :) leave a comment there if you have time, in my opinion it’s more interesting and concrete.
(MIRI did some work on logical induction.)
I’ll give the post a read!