1) It seems too weak: In the motivating scenario of Figure 3, isn’t is the case that “what the operator inputs” and “what’s in the memory register after 1 year” are “historically distributed identically”?
This assumption isn’t necessary to rule out memory-based world-models (see Figure 4). And yes you are correct that indeed it doesn’t rule them out.
2) It seems too strong: aren’t real-world features and/or world-models “dense”? Shouldn’t I be able to find features arbitrarily close to F*? If I can, doesn’t that break the assumption?
Yes. Yes. No. There are only finitely many short English sentences. (I think this answers your concern if I understand it correctly).
3) Also, I don’t understand what you mean by: “its on policy behavior [is described as] simulating X”. It seems like you (rather/also) want to say something like “associating reward with X”?
I don’t quite rely on the latter. Associating reward with X means that the rewards are distributed identically to X under all action sequences. Instead, the relevant implication here is: “the world-model’s on-policy behavior can be described as simulating X” implies “for on-policy action sequences, the world-model simulates X” which means “for on-policy action sequences, rewards are distributed identically to X.”
This assumption isn’t necessary to rule out memory-based world-models (see Figure 4). And yes you are correct that indeed it doesn’t rule them out.
Yes. Yes. No. There are only finitely many short English sentences. (I think this answers your concern if I understand it correctly).
I don’t quite rely on the latter. Associating reward with X means that the rewards are distributed identically to X under all action sequences. Instead, the relevant implication here is: “the world-model’s on-policy behavior can be described as simulating X” implies “for on-policy action sequences, the world-model simulates X” which means “for on-policy action sequences, rewards are distributed identically to X.”