Daniel Kokotajlo comments on A Chess-GPT Linear Emergent World Representation

Daniel Kokotajlo 8 Feb 2024 23:49 UTC
2 points
0
Hopefully, this synthetic dataset of superhuman chess bot games would provide higher quality data than human games. Second, I grabbed 16 million games from Lichess’s public chess game database. I trained separate models on individual datasets and various mixes of datasets.
Were the results basically the same in all cases? Or e.g. was the all-stockfish-data engine less robust to off-distribution inputs?
- Adam Karvonen 9 Feb 2024 2:54 UTC
  1 point
  0
  Parent
  The all stockfish data engine played at a level that was 100-200 Elo higher in my tests, with a couple caveats. First, I benchmarked the LLMs against stockfish, so an all stockfish dataset seems helpful for this benchmark. Secondly, the stockfish LLM would probably have an advantage for robustness because I included a small percentage of stockfish vs random move generator games in the stockfish dataset in the hopes that it would improve its ability.
  
  I haven’t done an in depth qualitative assessment of their abilities to give a more in depth answer unfortunately.