the reward is a function of the agent’s actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems.
should be reasonably easy to formalize, because it does not depend on a full [T]DT algorithm. After that, evaluate the performace of [a]DT under a [b]DT-aware Omega Newcomb’s problems, as described in the OP, where ‘a’ and ‘b’ are particular DTs, e.g. a=b=T.
Which issue/problem? fairness?
The fairness concept:
should be reasonably easy to formalize, because it does not depend on a full [T]DT algorithm. After that, evaluate the performace of [a]DT under a [b]DT-aware Omega Newcomb’s problems, as described in the OP, where ‘a’ and ‘b’ are particular DTs, e.g. a=b=T.