Knight Lee comments on VDT: a solution to decision theory

Knight Lee 21 Apr 2025 19:11 UTC
1 point
0
I was thinking that deductive explosion occurs for logical counterfactuals encountered during counterfactual mugging, but doesn’t occur for logical counterfactuals encountered when a UDT agent merely considers what would happen if it outputs something else (as a logical computation).
I agree that logical counterfactual mugging can work, just that it probably can’t be formalized, and may have an inevitable degree of subjectivity to it.
Coincidentally, just a few days ago I wrote a post on how we can use logical counterfactual mugging to convince a misaligned superintelligence to give humans just a little, even if it observes the logical information that humans lose control every time (and therefore has nothing to trade with it), unless math and logic itself was different. :) leave a comment there if you have time, in my opinion it’s more interesting and concrete.
- Mikhail Samin 23 Apr 2025 23:33 UTC
  2 points
  0
  Parent
  (MIRI did some work on logical induction.)
  I’ll give the post a read!