This latest thing is different from the one described there. I think the same people are behind it, but it’s a different exploit.
The old exploit was all about making use of a scoring technicality. The new one is about a genuine blindspot in (at least) KataGo and Leela Zero; what their networks learn about life and death through self-play is systematically wrong in a class of weird positions that scarcely ever occur in normal games. (They involve having cyclic “chains” of stones. The creator of KataGo has a plausible-sounding explanation for what sort of algorithm the networks of LZ and KG may be using, and why it would give wrong results in these cases.)
I would be very cautious about applying this to other AI systems, but it does match the following pattern that arguably is also common with LLMs: the AI has learned something that works well most of the time in practice, but what it’s learned falls short of genuine understanding and as a result it’s exploitable, and unlike a human who after being hit with this sort of thing once or twice would think “shit, I’ve been misunderstanding this” and try to reason through what went wrong, the AI doesn’t have that sort of metacognition and can just be exploited over and over and over.
This latest thing is different from the one described there. I think the same people are behind it, but it’s a different exploit.
The old exploit was all about making use of a scoring technicality. The new one is about a genuine blindspot in (at least) KataGo and Leela Zero; what their networks learn about life and death through self-play is systematically wrong in a class of weird positions that scarcely ever occur in normal games. (They involve having cyclic “chains” of stones. The creator of KataGo has a plausible-sounding explanation for what sort of algorithm the networks of LZ and KG may be using, and why it would give wrong results in these cases.)
I would be very cautious about applying this to other AI systems, but it does match the following pattern that arguably is also common with LLMs: the AI has learned something that works well most of the time in practice, but what it’s learned falls short of genuine understanding and as a result it’s exploitable, and unlike a human who after being hit with this sort of thing once or twice would think “shit, I’ve been misunderstanding this” and try to reason through what went wrong, the AI doesn’t have that sort of metacognition and can just be exploited over and over and over.
Cool! Thanks.