I think we’re probably brushing against the modelling assumptions required for the Elo formula. In particular, the following two are inconsistent with Elo assumption:
EVGO-optimal has a better chance of beating Stockfish than minmax-optimal
EVGO-optimal has a negative expected score against minmax-optimal
Yep. The Elo system is not designed to handle non-transitive rock-paper-scissors-style cycles.
This already exists to an extent with the advent of odds-chess bots like LeelaQueenOdds. This bot plays without her queen against humans, but still wins most of the time, even against strong humans who can easily beat Stockfish given the same queen odds. Stockfish will reliably outperform Leela under standard conditions.
I think we’re probably brushing against the modelling assumptions required for the Elo formula. In particular, the following two are inconsistent with Elo assumption:
EVGO-optimal has a better chance of beating Stockfish than minmax-optimal
EVGO-optimal has a negative expected score against minmax-optimal
Yep. The Elo system is not designed to handle non-transitive rock-paper-scissors-style cycles.
This already exists to an extent with the advent of odds-chess bots like LeelaQueenOdds. This bot plays without her queen against humans, but still wins most of the time, even against strong humans who can easily beat Stockfish given the same queen odds. Stockfish will reliably outperform Leela under standard conditions.
In rough terms:
Stockfish > LQO >> LQO (-queen) > strong humans > Stockfish (-queen)
Stockfish plays roughly like a minimax optimizer, whereas LQO is specifically trained to exploit humans.
Edit: For those interested, there’s some good discussion of LQO in the comments of this post:
https://www.lesswrong.com/posts/odtMt7zbMuuyavaZB/when-do-brains-beat-brawn-in-chess-an-experiment