The post studies handicapped chess as a domain to study how player capability and starting position affect win probabilities. From the conclusion:
In the view of Miles and others, the initially gargantuan resource imbalance between the AI and humanity doesn’t matter, because the AGI is so super-duper smart, it will be able to come up with the “perfect” plan to overcome any resource imbalance, like a GM playing against a little kid that doesn’t understand the rules very well.
The problem with this argument is that you can use the exact same reasoning to imply that’s it’s “obvious” that Stockfish could reliably beat me with queen odds. But we know now that that’s not true.
Since this post came out, a chess bot (LeelaQueenOdds) that has been designed to play with fewer pieces has come out. simplegeometry’s comment introduces it well. With queen odds, LQO is way better than Stockfish, which has not been designed for it. Consequentially, the main empirical result of the post is severely undermined. (I wonder how far even LQO is from truly optimal play against humans.)
(This is in addition to—as is pointed out by many commenters—how the whole analogue is stretched at best, given the many critical ways in which chess is different from reality. The post has little argument in favor of the validity of the analogue.)
I don’t think the post has stood the test of time, and vote against including it in the 2023 Review.
While I agree that this post was incorrect, I am fond of it, because the resulting conversation made a correct prediction that LeelaPieceOdds was possible. Most clearly in a thread started by lc:
I have wondered for a while if you couldn’t use the enormous online chess datasets to create an “exploitative/elo-aware” Stockfish, which had a superhuman ability to trick/trap players during handicapped games, or maybe end regular games extraordinarily quickly, and not just handle the best players.
(not quite a prediction as phrased, but I still infer a prediction overall).
Interestingly there were two reasons given for predicting that Stockfish is far from optimal when giving Queen odds to a less skilled player:
Stockfish is not trained on positions where it begins down a queen (out-of-distribution)
Stockfish is trained to play the Nash equilibrium move, not to exploit weaker play (non-exploiting)
The discussion didn’t make clear predictions about which factor would be most important, or whether both would be required, or whether it’s more complicated than that. Folks who don’t yet know might make a prediction before reading on.
For what it’s worth, my prediction was that non-exploiting play is more important. That’s mostly based on a weak intuition that starting without a queen isn’t that far out of distribution, and neural networks generalize well. Another way of putting it: I predicted that Stockfish was optimizing the wrong thing more than it was too dumb to optimize.
And the result? Alas, not very clear to me. My research is from the the lc0 blog, with posts such as The LeelaPieceOdds Challenge: What does it take you to win against Leela?. The journey began with the “contempt” setting, which I understand as expecting worse opponent moves. This allows reasonable opening play and avoids forced piece exchanges. However GM-beating play was unlocked with a fine-tuned odds-play-network, which impacts both out-of-distribution and non-exploiting concerns.
In our tests we still got reasonable play with up to rook+knight odds, but got poor performance with removed (otherwise blocked) bishops.
So missing a single bishop is in some sense further out-of-distribution than missing a rook and a knight! The later blog I linked explains a bit more:
Removing one of the two bishops leads to an unrealistic color imbalance regarding the pawn structure far beyond the opening phase.
An interesting example where the details of going out-of-distribution matter more than the scale of going out-of-distribution. There’s an article that may have more info in New in Chess, but it’s paywalled and I don’t know if has more info on the machine-learning aspects or the human aspects.
The post studies handicapped chess as a domain to study how player capability and starting position affect win probabilities. From the conclusion:
Since this post came out, a chess bot (LeelaQueenOdds) that has been designed to play with fewer pieces has come out. simplegeometry’s comment introduces it well. With queen odds, LQO is way better than Stockfish, which has not been designed for it. Consequentially, the main empirical result of the post is severely undermined. (I wonder how far even LQO is from truly optimal play against humans.)
(This is in addition to—as is pointed out by many commenters—how the whole analogue is stretched at best, given the many critical ways in which chess is different from reality. The post has little argument in favor of the validity of the analogue.)
I don’t think the post has stood the test of time, and vote against including it in the 2023 Review.
While I agree that this post was incorrect, I am fond of it, because the resulting conversation made a correct prediction that LeelaPieceOdds was possible. Most clearly in a thread started by lc:
(not quite a prediction as phrased, but I still infer a prediction overall).
Interestingly there were two reasons given for predicting that Stockfish is far from optimal when giving Queen odds to a less skilled player:
Stockfish is not trained on positions where it begins down a queen (out-of-distribution)
Stockfish is trained to play the Nash equilibrium move, not to exploit weaker play (non-exploiting)
The discussion didn’t make clear predictions about which factor would be most important, or whether both would be required, or whether it’s more complicated than that. Folks who don’t yet know might make a prediction before reading on.
For what it’s worth, my prediction was that non-exploiting play is more important. That’s mostly based on a weak intuition that starting without a queen isn’t that far out of distribution, and neural networks generalize well. Another way of putting it: I predicted that Stockfish was optimizing the wrong thing more than it was too dumb to optimize.
And the result? Alas, not very clear to me. My research is from the the lc0 blog, with posts such as The LeelaPieceOdds Challenge: What does it take you to win against Leela?. The journey began with the “contempt” setting, which I understand as expecting worse opponent moves. This allows reasonable opening play and avoids forced piece exchanges. However GM-beating play was unlocked with a fine-tuned odds-play-network, which impacts both out-of-distribution and non-exploiting concerns.
One surprise gives me more respect for the out-of-distribution theory. The developer’s blog first mentioned piece odds in The Lc0 v0.30.0 WDL rescale/contempt implementation
So missing a single bishop is in some sense further out-of-distribution than missing a rook and a knight! The later blog I linked explains a bit more:
An interesting example where the details of going out-of-distribution matter more than the scale of going out-of-distribution. There’s an article that may have more info in New in Chess, but it’s paywalled and I don’t know if has more info on the machine-learning aspects or the human aspects.