In the paper they start with just material balance—then via the learning process, their score on the evaluation test goes from “worse than all hand-written chess engines” to “better than all except the very best one” (and the best one, while more hand-crafted, also uses some ML/statistical tuning of numeric params, and has had a lot more effort put into it).
The reason why the NN solution currently doesn’t do as well in real games is because it’s slower to evaluate and therefore can’t brute-force as far.
In the paper they start with just material balance—then via the learning process, their score on the evaluation test goes from “worse than all hand-written chess engines” to “better than all except the very best one” (and the best one, while more hand-crafted, also uses some ML/statistical tuning of numeric params, and has had a lot more effort put into it).
The reason why the NN solution currently doesn’t do as well in real games is because it’s slower to evaluate and therefore can’t brute-force as far.