A problem with this entire line of reasoning, which I have given some thought to, is: how do you even define optimal play? My first thought was a 32-piece tablebase[1] but I don’t think this works. If we hand an objectively won position to the tablebase, it will play in a way that delivers mate in the fewest number of moves (assuming perfect play from the opponent). If you hand it a lost position it will play in a way that averts being mated for longest. But we have a problem when we hand it a drawn position. Assume for a second that the starting position is drawn[2] and our tablebase is White. So, the problem is that I don’t see a way to give our tablebase a sensible algorithm for choosing between moves (all of which lead to a draw if the tablebase is playing against itself).[3] If our tablebase chooses at random between them, then, in the starting position, playing a3/h3 is just as likely as playing e4/d4. This fundamental problem generalizes to every resulting position; the tablebase can’t distinguish between getting a position that a grandmaster would judge as ‘notably better with good winning chances’ and a position which would be judged as ‘horrible and very hard to hold in practice’ (so long as both of those positions would end in a draw with two 32-piece tablebases playing against each other). From this it seems rather obvious that if our tablebase picks at random among drawing moves, it would be unable to win[4]against, say, Stockfish 17 at depth 20 from the starting position (with both colors).
The second idea is to give infinite computing power and memory to Stockfish 17 but this runs into the same problem as with the tablebase, since Stockfish would calculate to the end and we run into the problem of Stockfish being a ministomax algorithm the same as a tablebase’s algorithm.
All of which is to say that either ‘optimal play’ wouldn’t achieve impressive practical results or we redefine ‘optimal play’ as ‘optimal play against [something]’.
To be more precise, I don’t see such a way that could be called ‘optimal’. If we are satisfied with the algorithm being defined as optimal against [humans in general]/grandmasters/[chess engines in general]/[Stockfish 17], then there are plenty of ways to implement this
This fundamental problem generalizes to every resulting position; the tablebase can’t distinguish between getting a position that a grandmaster would judge as ‘notably better with good winning chances’ and a position which would be judged as ‘horrible and very hard to hold in practice’ (so long as both of those positions would end in a draw with two 32-piece tablebases playing against each other).
From this it seems rather obvious that if our tablebase picks at random among drawing moves, it would be unable to win[4]against, say, Stockfish 17 at depth 20 from the starting position (with both colors).
Suppose the tablebase selected randomly from drawing moves, when presented with a drawing position. And the initial position is a drawing position. Then the table base either wins or draws. You can see this by thinking about the definitions.
It’s relatively easy to define optimal chess by induction, by the min-max algorithm.
You’re correct that for a suboptimal policy P, the policy Q which scores the best against P might not be an optimal play.
Of course. At no point did I suggest that it could lose. The ‘horrible and very hard to hold in practice’ was referring to the judgement of a hypothetical grandmaster, though I’m not sure if you were referring to that part.
”It’s relatively easy to define optimal chess by induction, by the min-max algorithm.” Once again, I agree. I failed to mention what I see as an obvious implication of my line of reasoning. Namely that optimal play (with random picking among drawing moves) would have a pretty unimpressive Elo [1](way lower than your estimates/upper bounds), one bounded by the Elo of the opponent/s. So: If we pit it against different engines in a tournament, I would expect the draw rate to be ~100% and the resulting Elo to be (in expectation) ever so slightly higher than the average rating of the engines it’s playing against. If we pit it against grandmasters I think similar reasoning applies (I’d expect the draw rate to be ~97-99%). You can extend this further to club-players, casual players, patzers and I would expect the draw rate to drop off, yes, but still remain high. Which suggests that optimal play (with random picking among drawing moves) would underperform Stockfish 17 by miles, since Stockfish could probably achieve a win rate of >99% against basically any group of human opponents.
There are plenty of algorithms which are provably optimal (minimax-wise) some of which would play very unimpressively in practice (like our random-drawn-move 32-piece tablebase) and some which could get a very high Elo estimaiton in ~all contexts. For example: If the position is won, use the 32-piece tablebase Same if the position is lost If the position is drawn, use Stockfish 17 at depth 25 to pick from the set of drawing moves.
This is optimal too, and would perform way better but that definition is quite inelegant. And the thing that I was trying to get at by asking about the specific definition, is that there is an astronomically large amount of optimal play algorithms, some of which could get a very low Elo in some contexts and some which could get a very high Elo irrespective of context. So when you write ‘What’s the Elo rating of optimal chess?‘, it seems reasonable to ask ‘Which optimal algorithm exactly?’.
A problem with this entire line of reasoning, which I have given some thought to, is: how do you even define optimal play?
My first thought was a 32-piece tablebase[1] but I don’t think this works. If we hand an objectively won position to the tablebase, it will play in a way that delivers mate in the fewest number of moves (assuming perfect play from the opponent). If you hand it a lost position it will play in a way that averts being mated for longest. But we have a problem when we hand it a drawn position. Assume for a second that the starting position is drawn[2] and our tablebase is White. So, the problem is that I don’t see a way to give our tablebase a sensible algorithm for choosing between moves (all of which lead to a draw if the tablebase is playing against itself).[3] If our tablebase chooses at random between them, then, in the starting position, playing a3/h3 is just as likely as playing e4/d4. This fundamental problem generalizes to every resulting position; the tablebase can’t distinguish between getting a position that a grandmaster would judge as ‘notably better with good winning chances’ and a position which would be judged as ‘horrible and very hard to hold in practice’ (so long as both of those positions would end in a draw with two 32-piece tablebases playing against each other).
From this it seems rather obvious that if our tablebase picks at random among drawing moves, it would be unable to win[4]against, say, Stockfish 17 at depth 20 from the starting position (with both colors).
The second idea is to give infinite computing power and memory to Stockfish 17 but this runs into the same problem as with the tablebase, since Stockfish would calculate to the end and we run into the problem of Stockfish being a ministomax algorithm the same as a tablebase’s algorithm.
All of which is to say that either ‘optimal play’ wouldn’t achieve impressive practical results or we redefine ‘optimal play’ as ‘optimal play against [something]’.
This is impossible, of course, but I’m looking for a definition, not an implementation.
This is almost definitely true.
To be more precise, I don’t see such a way that could be called ‘optimal’. If we are satisfied with the algorithm being defined as optimal against [humans in general]/grandmasters/[chess engines in general]/[Stockfish 17], then there are plenty of ways to implement this
There are bound to be some everett branches where our tablebase wins but they would be an astronomically small fraction of all results.
Suppose the tablebase selected randomly from drawing moves, when presented with a drawing position. And the initial position is a drawing position. Then the table base either wins or draws. You can see this by thinking about the definitions.
It’s relatively easy to define optimal chess by induction, by the min-max algorithm.
You’re correct that for a suboptimal policy P, the policy Q which scores the best against P might not be an optimal play.
Of course. At no point did I suggest that it could lose. The ‘horrible and very hard to hold in practice’ was referring to the judgement of a hypothetical grandmaster, though I’m not sure if you were referring to that part.
”It’s relatively easy to define optimal chess by induction, by the min-max algorithm.”
Once again, I agree. I failed to mention what I see as an obvious implication of my line of reasoning. Namely that optimal play (with random picking among drawing moves) would have a pretty unimpressive Elo [1](way lower than your estimates/upper bounds), one bounded by the Elo of the opponent/s.
So:
If we pit it against different engines in a tournament, I would expect the draw rate to be ~100% and the resulting Elo to be (in expectation) ever so slightly higher than the average rating of the engines it’s playing against.
If we pit it against grandmasters I think similar reasoning applies (I’d expect the draw rate to be ~97-99%).
You can extend this further to club-players, casual players, patzers and I would expect the draw rate to drop off, yes, but still remain high. Which suggests that optimal play (with random picking among drawing moves) would underperform Stockfish 17 by miles, since Stockfish could probably achieve a win rate of >99% against basically any group of human opponents.
There are plenty of algorithms which are provably optimal (minimax-wise) some of which would play very unimpressively in practice (like our random-drawn-move 32-piece tablebase) and some which could get a very high Elo estimaiton in ~all contexts. For example:
If the position is won, use the 32-piece tablebase
Same if the position is lost
If the position is drawn, use Stockfish 17 at depth 25 to pick from the set of drawing moves.
This is optimal too, and would perform way better but that definition is quite inelegant. And the thing that I was trying to get at by asking about the specific definition, is that there is an astronomically large amount of optimal play algorithms, some of which could get a very low Elo in some contexts and some which could get a very high Elo irrespective of context. So when you write ‘What’s the Elo rating of optimal chess?‘, it seems reasonable to ask ‘Which optimal algorithm exactly?’.
And very unimpressive level of play in drawn positions.