polytope comments on Shortform

polytope 16 Oct 2025 15:02 UTC
4 points
0
I do think there is some fun interesting detail in defining “optimal” here. Consider the following three players:
- A—Among all moves whose minimax value is maximal, chooses one uniformly at random (i.e. if there is at least one winning move, they choose one uniformly, else if there is at least one drawing move, they choose one uniformly, else they choose among losing moves uniformly).
- B—Among all moves whose minimax value is maximal, chooses one uniformly at random, but in cases of winning/losing, restricting to only moves that win as fast as possible or lose as slowly as possible (i.e. if there is at least one winning move, they choose one uniformly among those with the shortest distance to mate, else if there is at least one drawing move, they choose one uniformly, else they choose among losing moves uniformly with the longest distance to mate).
- C—Among all moves whose minimax value is maximal, chooses the one that the current latest Stockfish version as of today would choose if its search were restricted to only such moves given <insert some reasonable amount> of compute time on <insert some reasonable hardware>.
For C you can also define other variations using Leela Chess Zero, or even LeelaKnightOdds, etc, or other methods entirely of discriminating game-theoretically-equal-value moves based on density of losing/winning lines in the subtree, etc.
When people refer to “optimal” without further qualifiers in chess, often they mean something like A or B. But I would note that C is also an “optimal” player in the same sense of never playing a move leading to a worse game-theoretic value. However, C may well have a higher Elo than A or B when measured against a population of practical or “natural” players or other bots.
In particular, supposing chess is in fact a game theoretic draw from the starting position, I think there’s a decent chance we would find that A and B would typically give up small advantages for “no good reason” in the opening, and quickly incurring a slight positional or material disadvantage, until the fact that they never actually play any losing move becomes constraining and prevents them from ever becoming worse enough to actually lose. This is because in many branches of the game tree, there are probably many moves that draw, which will include moves that any human and/or bot today might analyze as “bad”, just not bad enough to actually lose. And indeed, the closer one’s position is to being winning without actually being winning yet, the worse a move can be without that move making the position losing, increasing the number of possible “bad” moves that can be chosen. When faced vs sufficiently strong but not quite perfect players (e.g. today’s strong bots) this might lead A and B to relatively consistently play into disadvantageous positions, harming their ability to win by making it much easier for their imperfect opponents to maintain a draw.

By contrast, variations of C might better maintain incremental advantages and pressure on imperfect opponents, leading imperfect opponents to more often make a mistake and give away a win. The issue, of course, is that unlike A and B, there isn’t so clean or canonical (“schelling point”) of a choice of C, as you have to pick what version, what amount of compute, etc. And different choices of C may have somewhat different characteristics against different distributions of possible opponents. This indicates that the concept of “Elo of optimal play”, without further qualification about what flavor of optimal and against what opponents, might be a little fuzzy and imperfect as a map of the territory when you zoom in close enough, although plausibly maybe it might not affect the big picture as much (although my suspicion is that the choice of these details is not entirely trivial even then).