if I could figure out a way to think about why randomization substitutes for memory.
Let A and B be actions leading to deterministic outcomes, and let C be some lottery between A and B. A rational agent will never prefer both C>A and C>B.
When you repeat the scenario without memory, the lottery is no longer exactly over choices the agent could deterministically make: the randomness is re-rolled in places where the agent doesn’t get another decision. Despite what the options are labeled, you’re really choosing between 2xA, 2xB, and a lottery over {2xA, 2xB, A+B}. Since the lottery contains an outcome that isn’t available to the deterministic decision, it may be preferred.
I think this is equivalent to the role played by observational evidence in UDT1: Observations allow a constant strategy to take different actions in different places, whereas without any observations to distinguish agent instances you have to pick one action to optimize both situations. Of course good evidence is reliably correlated with the environment whereas randomness doesn’t tell you which is which, but it’s better than nothing.
Let A and B be actions leading to deterministic outcomes, and let C be some lottery between A and B. A rational agent will never prefer both C>A and C>B.
When you repeat the scenario without memory, the lottery is no longer exactly over choices the agent could deterministically make: the randomness is re-rolled in places where the agent doesn’t get another decision. Despite what the options are labeled, you’re really choosing between 2xA, 2xB, and a lottery over {2xA, 2xB, A+B}. Since the lottery contains an outcome that isn’t available to the deterministic decision, it may be preferred.
I think this is equivalent to the role played by observational evidence in UDT1: Observations allow a constant strategy to take different actions in different places, whereas without any observations to distinguish agent instances you have to pick one action to optimize both situations. Of course good evidence is reliably correlated with the environment whereas randomness doesn’t tell you which is which, but it’s better than nothing.