I do think there is some fun interesting detail in defining “optimal” here. Consider the following three players:
A—Among all moves whose minimax value is maximal, chooses one uniformly at random (i.e. if there is at least one winning move, they choose one uniformly, else if there is at least one drawing move, they choose one uniformly, else they choose among losing moves uniformly).
B—Among all moves whose minimax value is maximal, chooses one uniformly at random, but in cases of winning/losing, restricting to only moves that win as fast as possible or lose as slowly as possible (i.e. if there is at least one winning move, they choose one uniformly among those with the shortest distance to mate, else if there is at least one drawing move, they choose one uniformly, else they choose among losing moves uniformly with the longest distance to mate).
C—Among all moves whose minimax value is maximal, chooses the one that the current latest Stockfish version as of today would choose if its search were restricted to only such moves given <insert some reasonable amount> of compute time on <insert some reasonable hardware>.
For C you can also define other variations using Leela Chess Zero, or even LeelaKnightOdds, etc, or other methods entirely of discriminating game-theoretically-equal-value moves based on density of losing/winning lines in the subtree, etc.
When people refer to “optimal” without further qualifiers in chess, often they mean something like A or B. But I would note that C is also an “optimal” player in the same sense of never playing a move leading to a worse game-theoretic value. However, C may well have a higher Elo than A or B when measured against a population of practical or “natural” players or other bots.
In particular, supposing chess is in fact a game theoretic draw from the starting position, I think there’s a decent chance we would find that A and B would typically give up small advantages for “no good reason” in the opening, and quickly incurring a slight positional or material disadvantage, until the fact that they never actually play any losing move becomes constraining and prevents them from ever becoming worse enough to actually lose. This is because in many branches of the game tree, there are probably many moves that draw, which will include moves that any human and/or bot today might analyze as “bad”, just not bad enough to actually lose. And indeed, the closer one’s position is to being winning without actually being winning yet, the worse a move can be without that move making the position losing, increasing the number of possible “bad” moves that can be chosen. When faced vs sufficiently strong but not quite perfect players (e.g. today’s strong bots) this might lead A and B to relatively consistently play into disadvantageous positions, harming their ability to win by making it much easier for their imperfect opponents to maintain a draw.
By contrast, variations of C might better maintain incremental advantages and pressure on imperfect opponents, leading imperfect opponents to more often make a mistake and give away a win. The issue, of course, is that unlike A and B, there isn’t so clean or canonical (“schelling point”) of a choice of C, as you have to pick what version, what amount of compute, etc. And different choices of C may have somewhat different characteristics against different distributions of possible opponents. This indicates that the concept of “Elo of optimal play”, without further qualification about what flavor of optimal and against what opponents, might be a little fuzzy and imperfect as a map of the territory when you zoom in close enough, although plausibly maybe it might not affect the big picture as much (although my suspicion is that the choice of these details is not entirely trivial even then).
Consider the 2-player game where A is allowed to broadcast a public message, then B is allowed to press one of 9 buttons or pass, and then A and B receive a result. Add a dummy player C as you suggested if you wish to make the game “zero sum” among 3 players rather than non-zero-sum among 2 players.
Rules:
* 10% of the time, all buttons are red, while 90% of the time, a uniform random single one of the buttons is blue while all other buttons are red.
* If B presses a blue button the resulting utility for (A,B) is (1, 1). If B presses a red button the result is (0, 0). If B passes, the result is (-1, 0.5).
* A has private information—only A can see the color of the buttons. B (and C, if C exists) is colorblind/blindfolded/whatever, but aside from that the rules of the game are common knowledge.
The following is a Nash equilibrium:
* If one of the buttons is blue, A always says truthfully which button is blue.
* If no button is blue, A picks one of the buttons uniformly at random and lies and says that button is blue.
* B always trusts A and presses the button that A claims was blue.
This is a Nash equilibrium because no player can do better in expectation by unilaterally deviating from this protocol. (A is receiving the maximum possible utility they can in every scenario so A cannot improve by unilaterally deviating. B doesn’t see the button colors so it boils down to trust A and get expected utility 0.9, or pass and get utility 0.5, so B should continue to trust A even though A lies sometimes).
Does this provide the kind of example you were thinking of?