This is something lc and gwern discussed in the comments here, but now we have clear evidence this is only true for Nash solvers (all typical engines like SF, Lc0, etc.). LeelaQueenOdds, which trained exploitatively against a model of top human players (FM+), is around 2k to 2.9k lichess elo depending on the time controls, so it completely trounces 1.6k elo players (especially 1.2k elo players as another commenter has suggested the author actually is). See: https://marcogio9.github.io/LeelaQueenOdds-Leaderboard/
Nash solvers are far too conservative and expect perfect play out of their opponents, hence give up most meaningful attacking chances in odds games. Exploitative models like LQO instead assume their opponents play like strong humans (good but imperfect) and do extremely well, despite a completely crushing material disadvantage. As some have noted, this is possible even with chess being a super sterile/simple environment relative to real life.
I speculate that the experiment from this post only yielded the results it did because Nash is a poor solution concept when one side is hopelessly disadvantaged under optimal play from both sides, and queen odds fall deep into that category.
I found it interesting to play against LeelaQueenOdds. My experiences:
I got absolutely crushed on 1+1 time controls (took me 50+ games to win one), but I’m competitive at 3+2 if I play seriously.
The model is really good at exploiting human blind spots and playing aggressively. I could feel it striking in my weak spots, but not being able to do much about it. (I now better acknowledge the existence of adversarial attacks for humans on a gut level.)
I found it really addictive to play against it: You know the trick that casinos use, where they make you feel like you “almost” won? This was that: I constantly felt like I could have won, if it wasn’t just for that one silly mistake—despite having lost the previous ten games to such “random mistakes”, too… I now better understand what it’s like to be a gambling addict.
Overall fascinating to play from a position that should be an easy win, but getting crushed by an opponent that Just Plays Better than I do.
[For context, I’m around 2100 in Lichess on short time controls (bullet/blitz). I also won against Stockfish 16 at rook odds on my first try—it’s really not optimized for this sort of thing.]
3 days ago an international master gave Leela “very slim chances” of winning a game, based on the results of a match played by a previous version of the engine
Oh, my bad, yeah. When I was writing the comment, I flipped the direction of advantage for longer time controls (longer time controls are actually better for humans in odds matches of course), but this way I agree it’s unclear a priori whether 200 elo drop would be enough to account for longer time controls.
I was playing this bot lately myself and one thing it made me wonder is, how much better would it be at beating me if it was trained against a model of me in particular, rather than how it actually was trained? I feel I have no idea.
This is something lc and gwern discussed in the comments here, but now we have clear evidence this is only true for Nash solvers (all typical engines like SF, Lc0, etc.). LeelaQueenOdds, which trained exploitatively against a model of top human players (FM+), is around 2k to 2.9k lichess elo depending on the time controls, so it completely trounces 1.6k elo players (especially 1.2k elo players as another commenter has suggested the author actually is). See: https://marcogio9.github.io/LeelaQueenOdds-Leaderboard/
Nash solvers are far too conservative and expect perfect play out of their opponents, hence give up most meaningful attacking chances in odds games. Exploitative models like LQO instead assume their opponents play like strong humans (good but imperfect) and do extremely well, despite a completely crushing material disadvantage. As some have noted, this is possible even with chess being a super sterile/simple environment relative to real life.
I speculate that the experiment from this post only yielded the results it did because Nash is a poor solution concept when one side is hopelessly disadvantaged under optimal play from both sides, and queen odds fall deep into that category.
See video from 7 minutes. Try it yourself https://lichess.org/@/LeelaQueenOdds :)
I found it interesting to play against LeelaQueenOdds. My experiences:
I got absolutely crushed on 1+1 time controls (took me 50+ games to win one), but I’m competitive at 3+2 if I play seriously.
The model is really good at exploiting human blind spots and playing aggressively. I could feel it striking in my weak spots, but not being able to do much about it. (I now better acknowledge the existence of adversarial attacks for humans on a gut level.)
I found it really addictive to play against it: You know the trick that casinos use, where they make you feel like you “almost” won? This was that: I constantly felt like I could have won, if it wasn’t just for that one silly mistake—despite having lost the previous ten games to such “random mistakes”, too… I now better understand what it’s like to be a gambling addict.
Overall fascinating to play from a position that should be an easy win, but getting crushed by an opponent that Just Plays Better than I do.
[For context, I’m around 2100 in Lichess on short time controls (bullet/blitz). I also won against Stockfish 16 at rook odds on my first try—it’s really not optimized for this sort of thing.]
A grandmaster just lost a classical game (60″+30″) against Leela Knight Odds https://lichess.org/broadcast/leela-knight-odds-vs-gm-joel-benjamin/game-5/MbKHEbdb/7Tnz8uBj
3 days ago an international master gave Leela “very slim chances” of winning a game, based on the results of a match played by a previous version of the engine
Thanks for this update! I find that an odd prediction by the IM because Awonder is around 2670 FIDE and Joel is around 2470 FIDE, 200 elo is huge.
I think it’s because 10+5 is very different from 60+30
Oh, my bad, yeah. When I was writing the comment, I flipped the direction of advantage for longer time controls (longer time controls are actually better for humans in odds matches of course), but this way I agree it’s unclear a priori whether 200 elo drop would be enough to account for longer time controls.
I was playing this bot lately myself and one thing it made me wonder is, how much better would it be at beating me if it was trained against a model of me in particular, rather than how it actually was trained? I feel I have no idea.
Maybe we’ll see the Go version of Leela give nine stones to pros soon? Or 20 stones to normal players?