Wow, I agree!
Let us modify the utility for the case f−1(y)=0 to:
u∗(h)={0h contains "self-terminate" actionu(h)otherwise
Meaning: no utility can be gained via subagents if the agent “jumps ship” (i.e. self-terminates to gain utility in case f−1(y)≠0).
Interesting. I’ll think of whether this works and can be generalised (it doesn’t make it reflectively stable—creating u-maximising subagents is still allowed, and doesn’t directly hurt the agent—but might improve the situation).
Wow, I agree!
Let us modify the utility for the case f−1(y)=0 to:
u∗(h)={0h contains "self-terminate" actionu(h)otherwise
Meaning: no utility can be gained via subagents if the agent “jumps ship” (i.e. self-terminates to gain utility in case f−1(y)≠0).
Interesting. I’ll think of whether this works and can be generalised (it doesn’t make it reflectively stable—creating u-maximising subagents is still allowed, and doesn’t directly hurt the agent—but might improve the situation).