I posted about it before here. Logical Counterfactuals are low-res. I think you are saying the same thing here. And yes, analyzing one’s own decision-making algorithms and adjusting them can be very useful. However, Abtam’s statement, as I understand it, does not have the explicit qualifier of incomplete knowledge of self. Quite the opposite, it says “Suppose you know that you take the $10”, not “You start with a first approximation that you take $10 and then explore further”.
You’re right—I didn’t see my confusion before, but Demski’s views don’t actually make much sense to me. The agent knows for certain that it will take $X? How can it know that without simulating its decision process? But if “simulate what my decision process is, then use that as the basis for counterfactuals” is part of the decision process, you’d get infinite regress. (Possible connection to fixed points?)
I don’t think Demski is saying that the agent would magically jump from taking $X to taking $Y. I think he’s saying that agents which fully understand their own behavior would be trapped by this knowledge because they can no longer form “reasonable” counterfactuals. I don’t think he’d claim that Agenthood can override fundamental physics, and I don’t see how you’re arguing that his beliefs, unbeknownst to him, are based on the assumption that Agenthood can override fundamental physics.
I cannot read his mind, odds are, I misinterpreted what he meant. But if MIRI doesn’t think that counterfactuals as they appear to be (“I could have made a different decision but didn’t, by choice”) are fundamental, then I would expect a careful analysis of that issue somewhere. Maybe I missed it. I have posted on a related topic some five months ago, and had some interesting feedback from jessicata (Jessica Tailor of MIRI) in the comments.
I posted about it before here. Logical Counterfactuals are low-res. I think you are saying the same thing here. And yes, analyzing one’s own decision-making algorithms and adjusting them can be very useful. However, Abtam’s statement, as I understand it, does not have the explicit qualifier of incomplete knowledge of self. Quite the opposite, it says “Suppose you know that you take the $10”, not “You start with a first approximation that you take $10 and then explore further”.
You’re right—I didn’t see my confusion before, but Demski’s views don’t actually make much sense to me. The agent knows for certain that it will take $X? How can it know that without simulating its decision process? But if “simulate what my decision process is, then use that as the basis for counterfactuals” is part of the decision process, you’d get infinite regress. (Possible connection to fixed points?)
I don’t think Demski is saying that the agent would magically jump from taking $X to taking $Y. I think he’s saying that agents which fully understand their own behavior would be trapped by this knowledge because they can no longer form “reasonable” counterfactuals. I don’t think he’d claim that Agenthood can override fundamental physics, and I don’t see how you’re arguing that his beliefs, unbeknownst to him, are based on the assumption that Agenthood can override fundamental physics.