There are plenty of incentives for people to find adversarial policies against other people. And, sure, there are stage magicians, but I don’t know any stage magician who can do the equivalent of winning at Go by playing a weird sequence of moves that confuses the senses (without exploiting any degrees of freedom other than the actual moves). AFAIK there is no weaponized stage magic either, which would be very useful if it was that efficient. (Camouflage, maybe? It stills seems pretty different.) Of course, it is possible that we cannot find these adversarial policies only because we are not able to interface with a human brain as directly as the adversarial policy training algorithm. In other words, maybe if you could play a lot of games against a Go champion while resetting their memory after every game, you would find something eventually (even though they randomize their moves, so the games don’t repeat). But, I dunno.
Of course, it is possible that we cannot find these adversarial policies only because we are not able to interface with a human brain as directly as the adversarial policy training algorithm.
IMO this point is very underappreciated. It’s heavily load bearing that the adversarial policy could train itself against a (very) high fidelity simulation of the Go engine, do the ML equivalent of “reading its mind” while doing so, and train against successively stronger versions of the engine (more searches per turn) and for arbitrarily long.
We can’t do any of these vs a human. Even though there are incentives to find adversarial exploits for human cognition, we can’t systematically run an adversarial optimiser over a human mind the way we can over an ML mind.
And con artists/scammers, abusers, etc. may perform the equivalent of adversarial exploits on the cognition of particular people.
In other words, maybe if you could play a lot of games against a Go champion while resetting their memory after every game, you would find something eventually (even though they randomize their moves, so the games don’t repeat). But, I dunno.
Not me, but a (very) strong Go amateur might be able to learn and adopt a policy that a compute limited agent found to beat a Go champion given such a setup (notice that it wasn’t humans that discovered the adversarial policy even in the KataGo case).
I don’t think they do the ML equivalent of “reading its mind”? AFAIU, they are just training an RL agent to play against a “frozen” policy. Granted, we still can’t do that against a human.
There are plenty of incentives for people to find adversarial policies against other people. And, sure, there are stage magicians, but I don’t know any stage magician who can do the equivalent of winning at Go by playing a weird sequence of moves that confuses the senses (without exploiting any degrees of freedom other than the actual moves). AFAIK there is no weaponized stage magic either, which would be very useful if it was that efficient. (Camouflage, maybe? It stills seems pretty different.) Of course, it is possible that we cannot find these adversarial policies only because we are not able to interface with a human brain as directly as the adversarial policy training algorithm. In other words, maybe if you could play a lot of games against a Go champion while resetting their memory after every game, you would find something eventually (even though they randomize their moves, so the games don’t repeat). But, I dunno.
IMO this point is very underappreciated. It’s heavily load bearing that the adversarial policy could train itself against a (very) high fidelity simulation of the Go engine, do the ML equivalent of “reading its mind” while doing so, and train against successively stronger versions of the engine (more searches per turn) and for arbitrarily long.
We can’t do any of these vs a human. Even though there are incentives to find adversarial exploits for human cognition, we can’t systematically run an adversarial optimiser over a human mind the way we can over an ML mind.
And con artists/scammers, abusers, etc. may perform the equivalent of adversarial exploits on the cognition of particular people.
Not me, but a (very) strong Go amateur might be able to learn and adopt a policy that a compute limited agent found to beat a Go champion given such a setup (notice that it wasn’t humans that discovered the adversarial policy even in the KataGo case).
I don’t think they do the ML equivalent of “reading its mind”? AFAIU, they are just training an RL agent to play against a “frozen” policy. Granted, we still can’t do that against a human.
Hmm, I think I nay have misunderstood/hallucinate the “reading its mind” analogy from an explained of the exploit I read elsewhere.