This is really cool! Request: Could you construct some sort of graph depicting the trade-off between ELO and initial material disadvantage?
Simple version would be: Calculate Leela’s ELO at each handicap level, then plot it on a graph of handicap level (in traditional points-based accounting, i.e. each pawn is 1 point, queens are 9 points, etc.) vs. ELO.
More complicated version would be a heatmap of Leela’s win probability, with ELO score of opponent on the x-axis and handicap level in points on the y-axis.
This is the best I’ve got so far. I estimated the rating using the midpoint of a logistic regression fit to the games. The first few especially seem to have been inflated due to not having enough high rated players in the data, so it had to extrapolate. And they all seem inflated by (I’d guess) a couple of hundred points due to the effects I mentioned in the post. (Edit: Please don’t share the graph alone without this context).
The NN rating in the Blitz data highlights the flaw in this method of estimating the rating.
I haven’t found a way to get similar data on human vs human games.
It would then be interesting to see if there is comparable data from human vs. human games. Perhaps there is some data on winrates when a player of elo level X goes up against a player of elo level Y who has a handicap of Z?
This is really cool! Request: Could you construct some sort of graph depicting the trade-off between ELO and initial material disadvantage?
Simple version would be: Calculate Leela’s ELO at each handicap level, then plot it on a graph of handicap level (in traditional points-based accounting, i.e. each pawn is 1 point, queens are 9 points, etc.) vs. ELO.
More complicated version would be a heatmap of Leela’s win probability, with ELO score of opponent on the x-axis and handicap level in points on the y-axis.
This is the best I’ve got so far. I estimated the rating using the midpoint of a logistic regression fit to the games. The first few especially seem to have been inflated due to not having enough high rated players in the data, so it had to extrapolate. And they all seem inflated by (I’d guess) a couple of hundred points due to the effects I mentioned in the post. (Edit: Please don’t share the graph alone without this context).
The NN rating in the Blitz data highlights the flaw in this method of estimating the rating.
I haven’t found a way to get similar data on human vs human games.
It would then be interesting to see if there is comparable data from human vs. human games. Perhaps there is some data on winrates when a player of elo level X goes up against a player of elo level Y who has a handicap of Z?