True, they will fail to cooperate for some R, but the values of such R have a low probability. (But yeah, it’s also required that uy and R are chosen independently—otherwise an adversary could just choose either so that it results in the players choosing different actions.)
The smoothness comes in from marginalising a random R. The coordination comes from making R and ε common knowledge, so they cooperate using the correlation in their observations—an interesting phenomenon.
(How can I write LaTeX in the comments?)
The game is indeed a clean example of Glomarization.
I might have misunderstood your main point, which I interpret as: “because of the counterfactual that I could have gotten a high card in this game, I shouldn’t reveal a low card with probability 1.” Are you sure that it’s because of the counterfactual in the current game, and not the possible consequences in my later interactions with B?
I would reveal my card if the game was truly unique, in that zero information leaks out. (Suppose B’s memories of this game are erased afterwards.)
In real life, my decision would affect Player B’s image of me, which affects how he will reason about similar games against me in the future. (And even how people close to him will reason about people like me.)
A multi-agent influence diagram on the iterated version shows how one can screw himself over in later games. If A first hides, then B cannot update her model of A:
If A first reveals a low card, then a pattern that’s almost-but-not-quite-like Newcomb’s is revealed: