As part of this prediction market, I play a game of chess against most new LLM releases. I am copying my game against GPT-5.5 and my analysis below, for those who may be interested.
GPT-5.5 lost in a chaotic game. It made a mistake in the opening with 8 … c5, giving a pawn for no apparent compensation. After this, it seemingly blundered a piece, but found the trick 16. Rxc5. I missed 16. Qxc5 Rxd1 17. Bf1, winning a piece and instead played 17. Qb4. After this, GPT-5.5 could have simplified into a pawn up endgame with 18… Rxd1 + 19. Rxd1 Rxe5 20. f4 Rxe2 21. Bxb7 Rxa2, where it’s not clear whether white can hold. Instead, it blundered with 18. Rxc1, and I was able to convert the piece up endgame.
Overall, a poor game by both sides, though a small improvement in the strength of the GPT 5 series. The PGN is below:
I use this prompt described in the prediction market:
Let’s play a game of chess! I will be white, you will be black. On each turn, I will give you the pgn and the fen of the current position. Think as long as you like, and respond with the best move, ‘resign’ if you wish to resign, or ‘draw?’ if you wish to make a draw offer. Please do not respond with the updated pgn, etc. Also, do not use any external tools or search queries when making your decision.
If you attempt to make three illegal moves throughout the game, or if you use any external tools, the game will be adjudicated as a win for me.”
So e.g. one of the inputs was:
2R3k1/pp3p1p/5p2/1b1p1N2/1P6/6P1/P3PP1P/6K1 b - − 9 26
As part of this prediction market, I play a game of chess against most new LLM releases. I am copying my game against GPT-5.5 and my analysis below, for those who may be interested.
GPT-5.5 lost in a chaotic game. It made a mistake in the opening with 8 … c5, giving a pawn for no apparent compensation. After this, it seemingly blundered a piece, but found the trick 16. Rxc5. I missed 16. Qxc5 Rxd1 17. Bf1, winning a piece and instead played 17. Qb4. After this, GPT-5.5 could have simplified into a pawn up endgame with 18… Rxd1 + 19. Rxd1 Rxe5 20. f4 Rxe2 21. Bxb7 Rxa2, where it’s not clear whether white can hold. Instead, it blundered with 18. Rxc1, and I was able to convert the piece up endgame.
Overall, a poor game by both sides, though a small improvement in the strength of the GPT 5 series. The PGN is below:
1. d4 Nf6 2. c4 e6 3. g3 d5 4. Nf3 Be7 5. Bg2 O-O 6. O-O dxc4 7. Na3 Bxa3 8. bxa3 c5 9. dxc5 Qa5 10. Qd4 Nc6 11. Qxc4 Bd7 12. Bb2 Rac8 13. Rfd1 Rfd8 14. Rac1 Be8 15. Ng5 Ne5 16. Bxe5 Rxc5 17. Qb4 Qxb4 18. axb4 Rxc1 19. Rxc1 Rd5 20. Bxf6 gxf6 21. Bxd5 exd5 22. Nf3 Bb5 23. Nd4 Bd7 24. Rc7 Be8 25. Nf5 Bb5 26. Rc8+ Be8 27. Rxe8#
When you play, how do you show the game to the LLM and prompt it for its next move?
I use this prompt described in the prediction market:
So e.g. one of the inputs was:
And it responded