If you’re interested in the opinion of someone who authored (and continues to work on) the #12 chess engine, I would note that there are at least two possibilities for what constitutes “optimal chess”—first would be “minimax-optimal chess”, wherein the player never chooses a move that worsens the theoretical outcome of the position (i.e. losing a win for a draw or a draw for a loss), choosing arbitrarily among the remaining moves available, and second would be “expected-value optimal” chess, wherein the player always chooses the move that maximises their expected value (that is, p(win) + 0.5 * p(draw)), taking into account the opponent’s behaviour. These two decision procedures are likely thousands of Elo apart when compared against e.g. Stockfish.
The first agent (Minimax-Optimal) will choose arbitrarily between the opening moves that aren’t f2f3 or g2g4, as they are all drawn. This style of decision-making will make it very easy for Stockfish to hold Minimax-Optimal to a draw.
The second agent (E[V]-Given-Opponent-Optimal) would, contrastingly, be willing to make a theoretical blunder against Stockfish if it knew that Stockfish would fail to punish such a move, and would choose the line of play most difficult for Stockfish to cope with. As such, I’d expect this EVGOO agent to beat Stockfish from the starting position, by choosing a very “lively” line of play.
I think we’re probably brushing against the modelling assumptions required for the Elo formula. In particular, the following two are inconsistent with Elo assumption:
EVGO-optimal has a better chance of beating Stockfish than minmax-optimal
EVGO-optimal has a negative expected score against minmax-optimal
Yep. The Elo system is not designed to handle non-transitive rock-paper-scissors-style cycles.
This already exists to an extent with the advent of odds-chess bots like LeelaQueenOdds. This bot plays without her queen against humans, but still wins most of the time, even against strong humans who can easily beat Stockfish given the same queen odds. Stockfish will reliably outperform Leela under standard conditions.
Consider a game like chess except, with probability epsilon, the player’s move is randomized uniformly from all legal moves. Let epsilon-optimal be the optimal strategy (defined via minmax) in epsilon-chess. We can consider this a strategy of ordinary chess also.
My guess is that epsilon-optimal would score better than mini-max-optimal against Stockfish. Of course, EVGO-optimal would score even better against Stockfish but that feels like cheating.
I am inclined to agree. The juice to squeeze generally arises from guiding the game into locations where there is more opportunity for your opponent to blunder. I’d expect that opponent-epsilon-optimal (i.e. your opponent can be forced to move randomly, but you cannot) would outperform both epsilon-optimal and minimax-optimal play against Stockfish.
Your description of EVGOO is incorrect; you describe a Causal Decision Theory algorithm, but (assuming the opponent also knows your strategy ‘cause otherwise you’re cheating) what you want is LDT. (Assuming they only see each others’ policy for that game, so an agent acting as eg CDT is indistinguishable from real CDT, then LDT is optimal even against such fantastic pathological opponents as “Minimax if my opponent looks like it’s following the algorithm that you the reader are hoping is optimal, otherwise resign” (or, if they can see each others’ policy for the whole universe of agents you’re testing, then LDT at least gets the maximum aggregate score).)
I’ll note that CDT and FDT prescribe identical actions against Stockfish, which is the frame of mind I had when writing.
More to your point—I’m not sure that I am describing CDT: ”always choose the move that maximises your expected value (that is, p(win) + 0.5 * p(draw)), taking into account your opponent’s behaviour” sounds like a decision rule that necessitates a logical decision theory, rather than excluding it?
Your point about pathological robustness is valid but I’m not sure how much this matters in the setting of chess.
Lastly, if we’re using the formalisms of CDT or FDT or whatever, I think this question ceases to be particularly interesting, as these are logically omniscient formalisms—so I presume you have some point that I’m missing about logically relaxed variants thereof.
I agree none of this is relevant to anything, I was just looking for intrinsically interesting thoughts about optimal chess.
I thought at least CDT could be approximated pretty well with a bounded variant; causal reasoning is a normal thing to do. FDT is harder, but some humans seem to find it a useful perspective, so presumably you can have algorithms meaningfully closer or further, and that is a useful proxy for something. Actually never mind, I have no experience with the formalisms.
I guess “choose the move that maximises your expected value” is technically compatible with FDT, you’re right. It seems like the obvious way to describe what CDT does, and a really unnatural way to describe what FDT does, so I got confused.
If you’re interested in the opinion of someone who authored (and continues to work on) the #12 chess engine, I would note that there are at least two possibilities for what constitutes “optimal chess”—first would be “minimax-optimal chess”, wherein the player never chooses a move that worsens the theoretical outcome of the position (i.e. losing a win for a draw or a draw for a loss), choosing arbitrarily among the remaining moves available, and second would be “expected-value optimal” chess, wherein the player always chooses the move that maximises their expected value (that is, p(win) + 0.5 * p(draw)), taking into account the opponent’s behaviour. These two decision procedures are likely thousands of Elo apart when compared against e.g. Stockfish.
The first agent (Minimax-Optimal) will choose arbitrarily between the opening moves that aren’t f2f3 or g2g4, as they are all drawn. This style of decision-making will make it very easy for Stockfish to hold Minimax-Optimal to a draw.
The second agent (E[V]-Given-Opponent-Optimal) would, contrastingly, be willing to make a theoretical blunder against Stockfish if it knew that Stockfish would fail to punish such a move, and would choose the line of play most difficult for Stockfish to cope with. As such, I’d expect this EVGOO agent to beat Stockfish from the starting position, by choosing a very “lively” line of play.
I think we’re probably brushing against the modelling assumptions required for the Elo formula. In particular, the following two are inconsistent with Elo assumption:
EVGO-optimal has a better chance of beating Stockfish than minmax-optimal
EVGO-optimal has a negative expected score against minmax-optimal
Yep. The Elo system is not designed to handle non-transitive rock-paper-scissors-style cycles.
This already exists to an extent with the advent of odds-chess bots like LeelaQueenOdds. This bot plays without her queen against humans, but still wins most of the time, even against strong humans who can easily beat Stockfish given the same queen odds. Stockfish will reliably outperform Leela under standard conditions.
In rough terms:
Stockfish > LQO >> LQO (-queen) > strong humans > Stockfish (-queen)
Stockfish plays roughly like a minimax optimizer, whereas LQO is specifically trained to exploit humans.
Edit: For those interested, there’s some good discussion of LQO in the comments of this post:
https://www.lesswrong.com/posts/odtMt7zbMuuyavaZB/when-do-brains-beat-brawn-in-chess-an-experiment
Interesting.
Consider a game like chess except, with probability epsilon, the player’s move is randomized uniformly from all legal moves. Let epsilon-optimal be the optimal strategy (defined via minmax) in epsilon-chess. We can consider this a strategy of ordinary chess also.
My guess is that epsilon-optimal would score better than mini-max-optimal against Stockfish. Of course, EVGO-optimal would score even better against Stockfish but that feels like cheating.
I am inclined to agree. The juice to squeeze generally arises from guiding the game into locations where there is more opportunity for your opponent to blunder. I’d expect that opponent-epsilon-optimal (i.e. your opponent can be forced to move randomly, but you cannot) would outperform both epsilon-optimal and minimax-optimal play against Stockfish.
Your description of EVGOO is incorrect; you describe a Causal Decision Theory algorithm, but (assuming the opponent also knows your strategy ‘cause otherwise you’re cheating) what you want is LDT.
(Assuming they only see each others’ policy for that game, so an agent acting as eg CDT is indistinguishable from real CDT, then LDT is optimal even against such fantastic pathological opponents as “Minimax if my opponent looks like it’s following the algorithm that you the reader are hoping is optimal, otherwise resign” (or, if they can see each others’ policy for the whole universe of agents you’re testing, then LDT at least gets the maximum aggregate score).)
I’ll note that CDT and FDT prescribe identical actions against Stockfish, which is the frame of mind I had when writing.
More to your point—I’m not sure that I am describing CDT:
”always choose the move that maximises your expected value (that is, p(win) + 0.5 * p(draw)), taking into account your opponent’s behaviour” sounds like a decision rule that necessitates a logical decision theory, rather than excluding it?
Your point about pathological robustness is valid but I’m not sure how much this matters in the setting of chess.
Lastly, if we’re using the formalisms of CDT or FDT or whatever, I think this question ceases to be particularly interesting, as these are logically omniscient formalisms—so I presume you have some point that I’m missing about logically relaxed variants thereof.
I agree none of this is relevant to anything, I was just looking for intrinsically interesting thoughts about optimal chess.
I thought at least CDT could be approximated pretty well with a bounded variant; causal reasoning is a normal thing to do. FDT is harder, but some humans seem to find it a useful perspective, so presumably you can have algorithms meaningfully closer or further, and that is a useful proxy for something.
Actually never mind, I have no experience with the formalisms.
I guess “choose the move that maximises your expected value” is technically compatible with FDT, you’re right.
It seems like the obvious way to describe what CDT does, and a really unnatural way to describe what FDT does, so I got confused.