There are some minor differences; your approach learns the whole model, whereas mine assumes the model is given, and learns only the “acausalish” aspects of it. But they are pretty similar.
One problem you might have, is learning the acausal stuff in the mid-term. If the agent learns that causality exists, and then that in the Newcomb problem is seems to have a causal effect, then it may search a lot for the causal link. Eventually this won’t matter (see here), but in the mid-term it might be a problem.
Or not. We need to test more ^_^
Well, being surprised by Omega seems rational. If I found myself in a real life Newcomb problem I would also be very surprised and suspect a trick for a while.
Moreover, we need to unpack “learns that causality exists”. A quasi-Bayesian agent will eventually learn that it is part of a universe ruled by the laws of physics. The laws of physics are the ultimate “Omega”: they predict the agent and everything else. Given this understanding, it is not more difficult than it should be to understand Newcomb!Omega as a special case of Physics!Omega. (I don’t really have an understanding of quasi-Bayesian learning algorithms and how learning one hypothesis affects the learning of further hypotheses, but it seems plausible that things can work this way.)