I was thinking that deductive explosion occurs for logical counterfactuals encountered during counterfactual mugging, but doesn’t occur for logical counterfactuals encountered when a UDT agent merely considers what would happen if it outputs something else (as a logical computation).
I agree that logical counterfactual mugging can work, just that it probably can’t be formalized, and may have an inevitable degree of subjectivity to it.
Coincidentally, just a few days ago I wrote a post on how we can use logical counterfactual mugging to convince a misaligned superintelligence to give humans just a little, even if it observes the logical information that humans lose control every time (and therefore has nothing to trade with it), unless math and logic itself was different. :) leave a comment there if you have time, in my opinion it’s more interesting and concrete.
I was thinking that deductive explosion occurs for logical counterfactuals encountered during counterfactual mugging, but doesn’t occur for logical counterfactuals encountered when a UDT agent merely considers what would happen if it outputs something else (as a logical computation).
I agree that logical counterfactual mugging can work, just that it probably can’t be formalized, and may have an inevitable degree of subjectivity to it.
Coincidentally, just a few days ago I wrote a post on how we can use logical counterfactual mugging to convince a misaligned superintelligence to give humans just a little, even if it observes the logical information that humans lose control every time (and therefore has nothing to trade with it), unless math and logic itself was different. :) leave a comment there if you have time, in my opinion it’s more interesting and concrete.
(MIRI did some work on logical induction.)
I’ll give the post a read!