I used to think that it would be very difficult for an LLM to build a model of chess because chess it is not about words and sentences. But a discussion led to the realization that the chess model underlying the chess notation is not that different from long distance referents in (programming) languages. Imagine the 2D chess grid not as a physical board but as a doubly nested array with fixed length (fixed length might make it even easier). GPT clearly can do that. And once it has learned that all higher layers can focus on the rules (though without the MCTS of AlphaZero).
I used to think that it would be very difficult for an LLM to build a model of chess because chess it is not about words and sentences. But a discussion led to the realization that the chess model underlying the chess notation is not that different from long distance referents in (programming) languages. Imagine the 2D chess grid not as a physical board but as a doubly nested array with fixed length (fixed length might make it even easier). GPT clearly can do that. And once it has learned that all higher layers can focus on the rules (though without the MCTS of AlphaZero).