Thanks for the comment! Note that we use state-action visitation distribution, so we consider trajectories that contain actions as well. This makes it possible to invert η (as long as all states are visited). Using only states trajectories, it would indeed be impossible to recover the policy.
Actually, this claim in the original post is false for state-based occupancy measures, but it might be true for state-action measures. From p163 of On Avoiding Power-Seeking by Artificial Intelligence:
(Edited to qualify the counterexample as only applying to the state-based case)
Thanks for the comment! Note that we use state-action visitation distribution, so we consider trajectories that contain actions as well. This makes it possible to invert η (as long as all states are visited). Using only states trajectories, it would indeed be impossible to recover the policy.
Thanks, this was an oversight on my part.