I didn’t follow some parts of the new algorithm. Probably most centrally: what is Dist(S)? Is this the type of distributions over real states of the world, and if so how do we have access to the true map Camera: S --> video? Based on that I likely have some other confusions, e.g. where are the camera_sequences and action_sequences coming from in the definition of Recognizer_M, what is the prior being used to define Camera−1, and don’t Recognizer_M and Recognizer_H effectively advance time a lot under some kind of arbitrary sequences of actions (making them unsuitable for exactly matching up states)?
I didn’t follow some parts of the new algorithm. Probably most centrally: what is Dist(S)? Is this the type of distributions over real states of the world, and if so how do we have access to the true map Camera: S --> video? Based on that I likely have some other confusions, e.g. where are the camera_sequences and action_sequences coming from in the definition of Recognizer_M, what is the prior being used to define Camera−1, and don’t Recognizer_M and Recognizer_H effectively advance time a lot under some kind of arbitrary sequences of actions (making them unsuitable for exactly matching up states)?