Regarding:
Observation 1: an agent in that epistemic+instrumental state has a strong instrumental incentive to figure out which branch it’s in. Agree/disagree?
I should add that this is a property that one of the other ways of training I’ve alluded to doesn’t have. As mentioned I will write about that later.
Regarding:
I should add that this is a property that one of the other ways of training I’ve alluded to doesn’t have. As mentioned I will write about that later.