On possible way how it could go wrong:
M to H: “Run out of the room!”
H runs out.
Adv prints something, but H never reads it. So M reached stable output.
I think that M only prints something after converging with Adv, and that Adv does not print anything directly to H
Yes, but all what I said could be just a convergent prediction of M. Not the real human runs out of the room, but M predicted that its model human of H’ will leave the room.
On possible way how it could go wrong:
M to H: “Run out of the room!”
H runs out.
Adv prints something, but H never reads it. So M reached stable output.
I think that M only prints something after converging with Adv, and that Adv does not print anything directly to H
Yes, but all what I said could be just a convergent prediction of M. Not the real human runs out of the room, but M predicted that its model human of H’ will leave the room.