Well now I’m less sure that it’s incorrect. I was originally imagining that like in Solomonoff induction, the TMs basically directly controlled AIXI’s actions, but that’s not right: there’s an expectimax. And if the TMs reinforce actions by shaping the rewards, in the AIXI formalism you learn that immediately and throw out those TMs.
Well now I’m less sure that it’s incorrect. I was originally imagining that like in Solomonoff induction, the TMs basically directly controlled AIXI’s actions, but that’s not right: there’s an expectimax. And if the TMs reinforce actions by shaping the rewards, in the AIXI formalism you learn that immediately and throw out those TMs.
Oh, actually, you’re right (that you were wrong). I think I made the same mistake in my previous comment. Good catch.