One way to think of it is that reward-seeking is the hypotheses in which the learned policy inherits its generalization propensities most directly from the RL algorithm (where “reward is most the optimization target”), so it also inherits CDT behavior from the RL algorithm.
The way I’d say this, which maybe you disagree with, is that reward-seeking is the hypothesis where we take the speed prior argument against scheming most seriously: we hypothesize that the AI will pursue the goal that requires the least instrumental reasoning while still using all its knowledge to training-game.
The way I’d say this, which maybe you disagree with, is that reward-seeking is the hypothesis where we take the speed prior argument against scheming most seriously: we hypothesize that the AI will pursue the goal that requires the least instrumental reasoning while still using all its knowledge to training-game.