Jozdien comments on Human beats SOTA Go AI by learning an adversarial policy

Jozdien 19 Feb 2023 19:34 UTC
4 points
0
I can believe that it’s possible to defeat a Go professional by some extremely weird strategy that causes them to have a seizure or something in that spirit. But, is there a way to do this that another human can learn to use fairly easily? This stretches credulity somewhat.
I’m a bit confused on this point. It doesn’t feel intuitive to me that you need a strategy so weird that it causes them to have a seizure (or something in that spirit). Chess preparation for example and especially world championship prep, often involves very deep lines calculated such that the moves chosen aren’t the most optimal given perfect play, but which lead a human opponent into an unfavourable position. One of my favourite games, for example, involves a position where at one point black is up three pawns and a bishop, and is still in a losing position (analysis) (This comment is definitely not just a front to take an opportunity to gush over this game).
Notice also that (AFAIK) there’s no known way to inoculate an AI against an adversarial policy without letting it play many times against it (after which a different adversarial policy can be found). Whereas even if there’s some easy way to “trick” a Go professional, they probably wouldn’t fall for it twice.
The kind of idea I mention is also true of new styles. The hypermodern school of play or the post-AlphaZero style would have led to newer players being able to beat earlier players of greater relative strength, in a way that I think would be hard to recognize from a single game even for a GM.
- Vanessa Kosoy 19 Feb 2023 19:58 UTC
  5 points
  1
  Parent
  My impression is that the adversarial policy used in this work is much stranger than the strategies you talk about. It’s not a “new style”, it’s something bizarre that makes no game-sense but confuses the ANN. The linked article shows that even a Go novice can easily defeat the adversarial policy.
  - Jozdien 19 Feb 2023 21:05 UTC
    3 points
    0
    Parent
    Yeah, but I think I registered that bizarreness as being from the ANN having a different architecture and abstractions of the game than we do. Which is to say, my confusion is from the idea that qualitatively this feels in the same vein as playing a move that doesn’t improve your position in a game-theoretic sense, but which confuses your opponent and results in you getting an advantage when they make mistakes. And that definitely isn’t trained adversarially against a human mind, so I would expect that the limit of strategies like this would allow for otherwise objectively far weaker players to defeat opponents they’ve customised their strategy to.
    - Radford Neal 19 Feb 2023 22:00 UTC
      12 points
      5
      Parent
      I’m not quite sure what you’re saying here, but the “confusion” the go-playing programs have here seems to be one that no human player beyond the beginner stage would have. They seem to be missing a fundamental aspect of the game.
      Perhaps the issue is that go is a game where intuitive judgements plus some tree search get you a long way, but there are occasional positions in which it’s necessary to use (maybe even devise and prove) what one might call a “theorem”. One is that “a group is unconditionally alive if it has two eyes”, with the correct definition of “eye”. For capture races, another theorem is that the group with more liberties is going to win. So if you’ve got 21 liberties and the other player has 20, you know you’ll win, even though this involves looking 40 moves ahead in a tree search. It may be that current go-playing programs are not capable of finding such theorems, in their fully-correct forms.
  - Radford Neal 19 Feb 2023 20:24 UTC
    3 points
    0
    Parent
    These sort of large-scale capturing races do arise in real human-human games. More so in games between beginners, but possible between more advanced players as well. The capturing race itself is not a “bizarre” thing. Of course it is not normal in a human-human game for a player to give away lots of points elsewhere on the board in order to set up such a capture race, since a reasonably good human player will be able to easily defend the targeted group before it’s too late.
    Qualifications: I’m somewhere around a 3 dan amateur Go player.
    - Taran 19 Feb 2023 21:43 UTC
      3 points
      0
      Parent
      In the paper, David Wu hypothesized one other ingredient: the stones involved have to form a circle rather than a tree (that is, excluding races that involve the edge of the board). I don’t think I buy his proposed mechanism but it does seem to be true that the bait group has to be floating in order for the exploit to work.