One of the authors of the paper here. Really glad to see so much discussion of our work! Just want to help clarify the Go rules situation (which in hindsight we could’ve done a better job explaining) and my own interpretation of our results.
We forked the KataGo source code (github.com/HumanCompatibleAI/KataGo-custom) and trained our adversary using the same rules that KataGo was trained on.[1] So while our current adversary wins via a technicality, it was a technicality that KataGo was trained to be aware of. Indeed, KataGo is able to recognize that passing would result in a forced win by our adversary, but given a low tree-search budget it does not have the foresight to avoid this. As evhub noted in another comment on this post, increasing the tree-search budget solves this issue.
So TL;DR I do believe we have a genuine exploit of the KataGo policy network, triggering a failure that it was trained to avoid.
Additionally, the project is still ongoing and we are working on attacks that are adversarial nature but win via other means (i.e. no weird rule technicalities). There are some promising preliminary results here which makes me think that the current exploit is not just a one-off exploit but evidence of something more general.[2]
Hi, one of the authors here speaking on behalf of the team. We’re excited to see that people are interested in our latest results. Just wanted to comment a bit on transferability.
The adversary trained in our paper has a 97% winrate against KataGo at superhuman strength, a 6.1% winrate against LeelaZero at superhuman strength, and a 3.5% winrate against ELF OpenGo at superhuman strength. Moreover, in the games that we do win, we win by carrying out the cyclic-exploit (see https://goattack.far.ai/transfer), which shows that LZ and ELF are definitely susceptible. In fact, Kellin was also able to beat LZ with 100k visits using the cyclic exploit.
And while it is true that our adversary has a significantly reduced winrate against LZ/ELF compared to KataGo, even a 3.5% winrate clearly demonstrates the existence of a flaw.[1] For example looking at goratings.org, a 3.5% win rate against the world #1 (3828 elo) is approximately 3245 elo, which is still in top 250 in the world. Considering that LZ/ELF are stronger than any human, the winrate we get against them should easily correspond to a top professional level of play, if not a superhuman level. But our adversary loses to a weak amateur (myself).
We haven’t confirmed this ourselves yet, but Golaxy and FineArt (two strong Chinese Go AIs) also seem to systematically misevaluate positions with cyclic groups. Our evidence is this bilbili video, which shows off various cyclic positions that KataGo, Golaxy, and FineArt all misevaluate. Golaxy and FineArt (绝艺) are shown at the end of the video.[2]
Now these are only misevaluated test-positions, which don’t necessarily imply that a from-scratch cyclic-exploit is possible to pull off. But given that a) LZ/ELF are both vulnerable; b) LZ was manually exploited by Kellin; and c) Golaxy and FineArt misevaluate cyclic-groups in the same way as KataGo; this does not bode well for Golaxy and FineArt. We are currently in the process of getting access to FineArt to test its robustness ourselves.
To our knowledge, this attack is the first exploit that consistently wins against top programs using substantial search, without repeating specific sequences (e.g., finding a particular game that a bot lost and replaying the key parts of it). Our adversary algorithm also learned from scratch, without using any existing knowledge. However, there are other known weaknesses of bots, such as a fairly specific, complex sequence called “Mi Yuting’s Flying Dagger joseki”, or the ladder tactic. While these weaknesses were previously widespread, targeted countermeasures for them have already been created, so they cannot be used to consistently win games against top programs like KataGo. Nonetheless, these weaknesses, along with the cyclic one our adversary targets, suggest that CNN-based MCTS Go AIs have a shared set of flaws. Perhaps similar learning algorithms / neural-net architectures learn similar circuits / heuristics and thus also share the same vulnerabilities?
One question that we have been thinking about is whether the cyclic-vulnerability lies with CNNs or with AlphaZero style training. For example, some folks in multiagent systems think that “the failure of naive self play to produce unexploitable policies is textbook level material”. On the other hand, David Wu’s tree vs. cycle theory seems to suggest that certain inductive biases of CNNs are also at play.
Our adversary was also run in a weird mode against LZ/ELF, because it modeled LZ/ELF as being KataGo. We ran our transfer evaluation this way because accurately modeling LZ/ELF would have required a lot of additional software engineering. It’s not entirely clear to me that accurate modeling would necessarily help though.
The same bilbili poster also appears to have replicated our manual cyclic-exploit against various versions of KataGo with a 9-stone handicap: https://space.bilibili.com/33337424/channel/seriesdetail?sid=2973285.