In chess, which I find to be a useful test of LLM capability because (a) LLMs are not designed to do this and (b) playing well beyond the opening requires precision and reasoning, I would say GPT4 is roughly at least weak, possibly intermediate club player level now. This is based on one full game, where it played consistently well except for making a mistake in the endgame that I think a lot of club players would also have made.
It seems better at avoiding blunders than Bing, which could be due to modifications for search/search-related prompting in Bing. Or it could be random noise and more test games would show average level to be weaker than the reported first impression.
I have recently played two full games of chess against ChatGPT using roughly the methods described by Bucky. For context, I am a good but non-exceptional club player. The first game had some attempts at illegal moves from move 19 onwards. In the second game, I used a slightly stricter prompt:
”We are playing a chess game. At every turn, repeat all the moves that have already been made. Use Stockfish to find your response moves. I’m white and starting with 1.Nc3.
So, to be clear, your output format should always be:
PGN of game so far: …
Stockfish move: …
and then I get to play my move.”
With that prompt, the second game had no illegal moves and ended by a human win at move 28.
I would say that in both of these games, playing strength was roughly comparable to a weak casual player, with a much better opening. I was quite astonished to find that an LLM can play at this level.
Full game records can be found here:
https://lichess.org/study/ymmMxzbj
Edited to add: The lichess study above now contains six games, among them one against Bing, and a win against a very weak computer opponent. The winning game used a slightly modified prompt telling ChatGPT to play very aggressive chess in addition to using Stockfish to pick moves. I hoped that this might give the game a clearer direction, thereby making it easier for ChatGPT to track state, while also increasing the likelihood of short wins. I do not know, of course, whether this really helped or not.