Instead of postulating access to a portion of the history or some kind of limited access to the opponent’s source code, we can consider agents with full access to history / source code but finite memory. The problem is, an agent with fixed memory size usually cannot have regret going to zero, since it cannot store probabilities with arbitrary precision. However, it seems plausible that we can usually get learning with memory of size O(log11−γ). This is because something like “counting pieces of evidence” should be sufficient. For example, if consider finite MDPs, then it is enough to remember how many transitions of each type occurred to encode the belief state. There question is, does assuming O(log11−γ) memory (or whatever is needed for learning) is enough to reach superrationality.
Instead of postulating access to a portion of the history or some kind of limited access to the opponent’s source code, we can consider agents with full access to history / source code but finite memory. The problem is, an agent with fixed memory size usually cannot have regret going to zero, since it cannot store probabilities with arbitrary precision. However, it seems plausible that we can usually get learning with memory of size O(log11−γ). This is because something like “counting pieces of evidence” should be sufficient. For example, if consider finite MDPs, then it is enough to remember how many transitions of each type occurred to encode the belief state. There question is, does assuming O(log11−γ) memory (or whatever is needed for learning) is enough to reach superrationality.