That’s about right. The key point is, “applying the counterfactual belief that the predictor is always right” is not really well-defined (that’s why people have been struggling with TDT/UDT/FDT for so long) while the thing I’m doing is perfectly well-defined. I describe agents that are able to learn which predictors exist in their environment and respond rationally (“rationally” according to the FDT philosophy).
TRL is for many things to do with rational use of computational resources, such as (i) doing multi-level modelling in order to make optimal use of “thinking time” and “interacting with environment time” (i.e. simultaneously optimize sample and computational complexity) (ii) recursive self-improvement (iii) defending from non-Cartesian daemons (iv) preventing thought crimes. But, yes, it also provides a solution to ASP. TRL agents can learn whether it’s better to be predictable or predicting.
“The key point is, “applying the counterfactual belief that the predictor is always right” is not really well-defined”—What do you mean here?
I’m curious whether you’re referring to the same as or similar to the issue I was referencing in Counterfactuals for Perfect Predictors. The TLDR is that I was worried that it would be inconsistent for an agent that never pays in Parfait’s Hitchhiker to end up in town if the predictor is perfect, so that it wouldn’t actually be well-defined what the predictor was predicting. And the way I ended up resolving this was by imagining it as an agent that takes input and asking what it would output if given that inconsistent input. But not sure if you were referencing this kind of concern or something else.
It is not a mere “concern”, it’s the crux of problem really. What people in the AI alignment community have been trying to do is, starting with some factual and “objective” description of the universe (such a program or a mathematical formula) and deriving counterfactuals. The way it’s supposed to work is, the agent needs to locate all copies of itself or things “logically correlated” with itself (whatever that means) in the program, and imagine it is controlling this part. But a rigorous definition of this that solves all standard decision theoretic scenarios was never found.
Instead of doing that, I suggest a solution of different nature. In quasi-Bayesian RL, the agent never arrives at a factual and objective description of the universe. Instead, it arrives at a subjective description which already includes counterfactuals. I then proceed to show that, in Newcomb-like scenarios, such agents receive optimal expected utility (i.e. the same expected utility promised by UDT).
Yeah, I agree that the objective descriptions can leave out vital information, such as how the information you know was acquired, which seems important for determining the counterfactuals.
That’s about right. The key point is, “applying the counterfactual belief that the predictor is always right” is not really well-defined (that’s why people have been struggling with TDT/UDT/FDT for so long) while the thing I’m doing is perfectly well-defined. I describe agents that are able to learn which predictors exist in their environment and respond rationally (“rationally” according to the FDT philosophy).
TRL is for many things to do with rational use of computational resources, such as (i) doing multi-level modelling in order to make optimal use of “thinking time” and “interacting with environment time” (i.e. simultaneously optimize sample and computational complexity) (ii) recursive self-improvement (iii) defending from non-Cartesian daemons (iv) preventing thought crimes. But, yes, it also provides a solution to ASP. TRL agents can learn whether it’s better to be predictable or predicting.
“The key point is, “applying the counterfactual belief that the predictor is always right” is not really well-defined”—What do you mean here?
I’m curious whether you’re referring to the same as or similar to the issue I was referencing in Counterfactuals for Perfect Predictors. The TLDR is that I was worried that it would be inconsistent for an agent that never pays in Parfait’s Hitchhiker to end up in town if the predictor is perfect, so that it wouldn’t actually be well-defined what the predictor was predicting. And the way I ended up resolving this was by imagining it as an agent that takes input and asking what it would output if given that inconsistent input. But not sure if you were referencing this kind of concern or something else.
It is not a mere “concern”, it’s the crux of problem really. What people in the AI alignment community have been trying to do is, starting with some factual and “objective” description of the universe (such a program or a mathematical formula) and deriving counterfactuals. The way it’s supposed to work is, the agent needs to locate all copies of itself or things “logically correlated” with itself (whatever that means) in the program, and imagine it is controlling this part. But a rigorous definition of this that solves all standard decision theoretic scenarios was never found.
Instead of doing that, I suggest a solution of different nature. In quasi-Bayesian RL, the agent never arrives at a factual and objective description of the universe. Instead, it arrives at a subjective description which already includes counterfactuals. I then proceed to show that, in Newcomb-like scenarios, such agents receive optimal expected utility (i.e. the same expected utility promised by UDT).
Yeah, I agree that the objective descriptions can leave out vital information, such as how the information you know was acquired, which seems important for determining the counterfactuals.