Yep, that’s better. There’s still the risk of subagents being created—when the agent thinks that f−1(y)≠0, almost certainly, but not completely certainly. Then it might create a u-maximising subagent and then self-terminate.
Interesting. I’ll think of whether this works and can be generalised (it doesn’t make it reflectively stable—creating u-maximising subagents is still allowed, and doesn’t directly hurt the agent—but might improve the situation).
Yep, that’s better. There’s still the risk of subagents being created—when the agent thinks that f−1(y)≠0, almost certainly, but not completely certainly. Then it might create a u-maximising subagent and then self-terminate.
That means that this design, like most indifference designs, is reflectively consistent but not reflectively stable.
Wow, I agree!
Let us modify the utility for the case f−1(y)=0 to:
u∗(h)={0h contains "self-terminate" actionu(h)otherwise
Meaning: no utility can be gained via subagents if the agent “jumps ship” (i.e. self-terminates to gain utility in case f−1(y)≠0).
Interesting. I’ll think of whether this works and can be generalised (it doesn’t make it reflectively stable—creating u-maximising subagents is still allowed, and doesn’t directly hurt the agent—but might improve the situation).