I was pretty taken aback by the article claiming that the Kata-Go AI apparently has something like a human-exploitable distorted concept of “liberties”.
If we could somehow ask Kata-Go how it defined “liberties”, I suspect that it would have been more readily clear that its concept was messed-up. But of course, a huge part of The Problem is that we have no idea what these neural nets are actually doing.
So I propose the following challenge: Make a hybrid Kata-Go/LLM AI that makes the same mistake and outputs text representing its reasoning in which the mistake is recognizable.
I was pretty taken aback by the article claiming that the Kata-Go AI apparently has something like a human-exploitable distorted concept of “liberties”.
If we could somehow ask Kata-Go how it defined “liberties”, I suspect that it would have been more readily clear that its concept was messed-up. But of course, a huge part of The Problem is that we have no idea what these neural nets are actually doing.
So I propose the following challenge: Make a hybrid Kata-Go/LLM AI that makes the same mistake and outputs text representing its reasoning in which the mistake is recognizable.
It would be funny if the Go part continued making the same mistake, and the LLM part just made up bullshit explanations.