I really like this question and this analysis!
I think an extension I’d do here is to restrict the “3 reasonable moves” picture by looking at proposed moves of different agents in various games. My guess is that in fact the “effective information content” in a move at high-level play is less than 1 bit per move on average. If you had a big gpu to throw at this problem you could try to explicitly train an engine via an RL policy with a strong entropy objective and see what maximal entropy is compatible with play at different ratings
Yep, I thought of a similar method: (1) Find a trend between Elo and the entropy of moves during the middle-game. (2) Estimate the middle-game entropy of optimal chess. But the obstacle is (2), there’s probably high-entropy optimal strategies!
Here’s an attack I’m thinking about:
Consider epsilon-chess, which is like chess except with probability epsilon the pieces move randomly, say epsilon=10^-5. In this environment, the optimal strategies probably have very low entropy because the quality function has a continuous range so argmax won’t be faced with any ties. This makes the question better defined: there’s likely to be a single optimal policy, which is also deterministic.
This is inspired by @Dalcy’s PIBBSS project (unpublished, but I’ll send you link in DM).
I really like this question and this analysis! I think an extension I’d do here is to restrict the “3 reasonable moves” picture by looking at proposed moves of different agents in various games. My guess is that in fact the “effective information content” in a move at high-level play is less than 1 bit per move on average. If you had a big gpu to throw at this problem you could try to explicitly train an engine via an RL policy with a strong entropy objective and see what maximal entropy is compatible with play at different ratings
Yep, I thought of a similar method: (1) Find a trend between Elo and the entropy of moves during the middle-game. (2) Estimate the middle-game entropy of optimal chess. But the obstacle is (2), there’s probably high-entropy optimal strategies!
Here’s an attack I’m thinking about:
Consider epsilon-chess, which is like chess except with probability epsilon the pieces move randomly, say epsilon=10^-5. In this environment, the optimal strategies probably have very low entropy because the quality function has a continuous range so argmax won’t be faced with any ties. This makes the question better defined: there’s likely to be a single optimal policy, which is also deterministic.
This is inspired by @Dalcy’s PIBBSS project (unpublished, but I’ll send you link in DM).
Very cool, thanks! I agree that Dalcy’s epsilon-game picture makes arguments about ELO vs. optimality more principled