Stockfish is not a chess superintelligence (and it doesn’t need to be)
TLDR: I demonstrate Stockfish is not a chess superintelligence in the sense of understanding the game better than all humans in all situations. It still kicks our ass. In the same way, AI may end up not dominating us in all fields but still kick our ass in a fight for control of the future.
Stockfish is good at chess. Like, really good. It has an elo rating of 3700, over 800 points higher than the human record of 2882, giving it theoretical 100-1 odds against Magnus Carlsen at his peak.[1]It comes in ahead of engines like Komodo, which in November 2020 beat top human Grandmaster Hikaru Nakamura down 2 pawns.
So it may surprise some to learn that it is not, by current definitions of “general” or “superintelligence”, a chess superintelligence. In fact, it is not hard to create situations which humans can easily evaluate better than it. Take the following:
This game is an obvious draw. White has extra pieces, sure, but they actually cannot do anything. White can move his rooks around all he wants, but black is not forced to take them and can survive the rest of the game just freely moving his king around.
Stockfish, relying heavily on deep search over subtle long term evaluation, does not see this. It thinks that white is clearly winning, and doesn’t realise that this is not the case until you play out the moves up to the point where it sees the “50 move rule” draw arriving.
Indeed, one might say it suffers from having too short a time horizon. This is not restricted to constructed positions either, as I have had (one, singular) position in the past where I have outcalculated the engine. Although the exact position has now sadly been lost to time, the situation was a survival puzzle rush I was solving with a friend. We spent ages looking at this puzzle, convinced it was a draw. Ten moves deep in the calculation, we reached a position where we were up a couple of pawns, but the opposition had a fortress, making it impossible to extricate his king.
After hours of search, we submitted our answer, and, checking with the engine realised that it had been incorrect. Going through its moves one by one, it soon realised that the line it suggested was in fact a draw. In a similar manner to the position above, it had not realised that the material advantage came to nothing and did not resolve to a win over a long enough time horizon.
I mention this because it importantly points to the fact that this is a limitation which also occurs in games, and therefore almost certainly has been optimised against by its thousands of contributors. For those of you wondering, no, neural network based Leela Chess Zero does not do any better on these positions.
So, what’s my point? Well, I think that this rather critically points to the fact that AI does not need to dominate us at literally every task in order to actually take over the world/kill everyone. Stockfish can obliterate any human at chess while in some situations having limitations that people who have just learnt the game can see. In a similar way, I expect an AI which dominates humans at everything except for the ARC-AGI 2 tasks to be pretty capable of taking over the world.
Don’t be distracted by the shiny AGI noise.
- ^
This is particularly notable given the extreme tendency for draws at the top level of chess (the 1 expected of Carlsen would almost certainly consist of 2 draws).
My understanding is that there are multiple different methods for detecting fortresses, but these positions are fairly rare and this hasn’t been a focus of the developers[1]. One reason is that all Stockfish commits must pass though fishtest, a series of self-play games to prevent regressions, and the added strength from fortress detection historically hasn’t been worth the additional complexity. I don’t think it would be very difficult to make a version of Stockfish which was slightly weaker overall but fixed this issue.
Though there has been some recent progress!
Yeah, so it’s probably possible to fix the specific issue that I mentioned in terms of fortresses, but I think that it points to a deeper flaw where the evaluation is piecemeal rather than holistic in the same sense as humans. So, like there are studies which are a couple of hundred moves deep, and because every move is “forced”, humans can see the solution while computers are killed by the branching. In any case, I think the overall point of the post is probably orthogonal to this.
Yeah ok but that does not justify the juicy title.
Yeah I wasn’t quite sure what to put. Would you rather something like “Stockfish isn’t a chess superintelligence”?
Yes but you can surely do better. There’s a real point here and it’s pretty straightforward and it’s neither of those.
“Dominant doesn’t mean perfect”