I did say “suppose you are deterministic”. That said, can you spell out how CDT ratifies the optimal policy if randomization is allowed?
I believe it follows from this proof: https://www.alignmentforum.org/posts/5bd75cc58225bf06703751b2/in-memoryless-cartesian-environments-every-udt-policy-is-a
I did say “suppose you are deterministic”. That said, can you spell out how CDT ratifies the optimal policy if randomization is allowed?
I believe it follows from this proof: https://www.alignmentforum.org/posts/5bd75cc58225bf06703751b2/in-memoryless-cartesian-environments-every-udt-policy-is-a