This is pitched as unsupervised from scratch, but on page 18, it talks about training to mimic another engine’s board evaluation function.
I don’t follow how the various networks and training regimes relate to each other. If the supervised training only produced the features listed on 18-19, that doesn’t seem like a big deal, because those are very basic chess concepts that exhibit very little expertise. (It did jump out to me that the last item, the lowest-value piece among attackers of a square is much more complicated than any of the other features.) And the point is to make a much better evaluation function and trade off against brute force. But I’m not sure what’s going on.
The main training isn’t entirely from scratch, either, but makes use of random positions from expert games, implicitly labeled balanced. It would be interesting to know how important that was, and what would happen if the system purely bootstrapped, generating typical positions by playing against itself.
That sounds correct, but in the first step, I think what they are optimizing is not (just) features but representations of those features. The natural language descriptions of the features are very simple ones that require no expertise, but some representations and ways of wiring them together are more conducive to learning and more conducive to building higher level features that Stockfish uses. But, again, it doesn’t sound like they are applying much optimization power at this step.
Also, one thing that I think is just omitted is whether the network in the first step is the same network in the later steps, or whether the later steps introduce more layers to exploit the same features.
This is pitched as unsupervised from scratch, but on page 18, it talks about training to mimic another engine’s board evaluation function.
I don’t follow how the various networks and training regimes relate to each other. If the supervised training only produced the features listed on 18-19, that doesn’t seem like a big deal, because those are very basic chess concepts that exhibit very little expertise. (It did jump out to me that the last item, the lowest-value piece among attackers of a square is much more complicated than any of the other features.) And the point is to make a much better evaluation function and trade off against brute force. But I’m not sure what’s going on.
The main training isn’t entirely from scratch, either, but makes use of random positions from expert games, implicitly labeled balanced. It would be interesting to know how important that was, and what would happen if the system purely bootstrapped, generating typical positions by playing against itself.
It looks to me as if they did the following:
Design the features:
Manually try various combinations of features. For each candidate feature-set, attempt to learn Stockfish’s evaluation function.
Having chosen features, learn the weights:
Initialize weights via some kind of bootstrapping process using a manually-designed (but deliberately rather stupid) evaluator.
Optimize weights by unsupervised TD-leaf learning using a large database of positions (from computer-computer games) as starting points for self-play.
That sounds correct, but in the first step, I think what they are optimizing is not (just) features but representations of those features. The natural language descriptions of the features are very simple ones that require no expertise, but some representations and ways of wiring them together are more conducive to learning and more conducive to building higher level features that Stockfish uses. But, again, it doesn’t sound like they are applying much optimization power at this step.
Also, one thing that I think is just omitted is whether the network in the first step is the same network in the later steps, or whether the later steps introduce more layers to exploit the same features.
Features and representations: agreed. (I wasn’t trying to be precise.)
I assumed the same network in the first step as later, but agree that it isn’t made explicit in the paper.