Two things came together to make really strong computer go players. The first was Monte Carlo tree search. Yes, that name has “search” in it, but it amounted to making programs less search-y, for two reasons. First, the original versions of MCTS did evaluations at the leaves of its trees by random full-game playouts, which took substantial time, so the trees were smaller than for earlier tree-searching programs. Second, the way MCTS propagates results up the tree moves away from the minimax paradigm and allows for the possibility that some tree nodes that seem to arise only from inferior moves (and hence should be irrelevant, minimaximally) actually have useful information in them about the position. The second was the use of big neural networks for evaluation instead of those Monte Carlo playouts. This amounts to having a really fancy static evaluation function, and (with the use of a policy net/head as in AlphaGo and its successors) a highly selective search that chooses what moves to include in the search according to how promising they look.
So when Sutton says this
Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale.
that seems misleading; while it’s true that the currently best go programs minimize their use of human knowledge, it’s not by “applying search effectively at scale” that they do it.
Now, this does all fit into the broader pattern of “leveraging computation”. Fair enough, I guess, but what else would you expect? That one can play superhumanly strong go by doing only a few thousand elementary steps of computation per move? That applying more computation wouldn’t help? (There’s a reason why human players, when they want to play well, take more time to think.)
Now, this does all fit into the broader pattern of “leveraging computation”. Fair enough, I guess, but what else would you expect?
It also fits into the pattern of (as you yourself pointed out) minimizing human knowledge during the construction of these programs, allowing them to tease out the features of the problem space on their own. The claim here is that as computing power increases, domain-agnostic approaches (i.e. approaches that do not require programmers to explicitly encode human-created heuristics) will increasingly outperform domain-specific approaches (which do rely on externally encoded human knowledge).
This is a non-trivial claim! For example, it wasn’t at all obvious prior to January 2017 that traditional chess engines (whose static evaluation functions are filled with human-programmed heuristics) could be overtaken by a pure learning-based approach, and yet the AlphaZero paper came out and showed it was possible. If the larger claim is true, then that might suggest directions for further research—in particular, approaches that abstract away large parts of a problem may have more success than approaches that focus on the details of the problem structure.
Yes, that’s fair. (Though I’m not sure about the terms “domain-agnostic” and “domain-specific”; e.g., the AlphaZero approach seems to work well for a wide variety of board games played on “regular” boards but would need substantial modification to apply even to other board games and isn’t obviously applicable at all to anything that isn’t more or less a board game.)
What he says about computer go seems wrong to me.
Two things came together to make really strong computer go players. The first was Monte Carlo tree search. Yes, that name has “search” in it, but it amounted to making programs less search-y, for two reasons. First, the original versions of MCTS did evaluations at the leaves of its trees by random full-game playouts, which took substantial time, so the trees were smaller than for earlier tree-searching programs. Second, the way MCTS propagates results up the tree moves away from the minimax paradigm and allows for the possibility that some tree nodes that seem to arise only from inferior moves (and hence should be irrelevant, minimaximally) actually have useful information in them about the position. The second was the use of big neural networks for evaluation instead of those Monte Carlo playouts. This amounts to having a really fancy static evaluation function, and (with the use of a policy net/head as in AlphaGo and its successors) a highly selective search that chooses what moves to include in the search according to how promising they look.
So when Sutton says this
that seems misleading; while it’s true that the currently best go programs minimize their use of human knowledge, it’s not by “applying search effectively at scale” that they do it.
Now, this does all fit into the broader pattern of “leveraging computation”. Fair enough, I guess, but what else would you expect? That one can play superhumanly strong go by doing only a few thousand elementary steps of computation per move? That applying more computation wouldn’t help? (There’s a reason why human players, when they want to play well, take more time to think.)
It also fits into the pattern of (as you yourself pointed out) minimizing human knowledge during the construction of these programs, allowing them to tease out the features of the problem space on their own. The claim here is that as computing power increases, domain-agnostic approaches (i.e. approaches that do not require programmers to explicitly encode human-created heuristics) will increasingly outperform domain-specific approaches (which do rely on externally encoded human knowledge).
This is a non-trivial claim! For example, it wasn’t at all obvious prior to January 2017 that traditional chess engines (whose static evaluation functions are filled with human-programmed heuristics) could be overtaken by a pure learning-based approach, and yet the AlphaZero paper came out and showed it was possible. If the larger claim is true, then that might suggest directions for further research—in particular, approaches that abstract away large parts of a problem may have more success than approaches that focus on the details of the problem structure.
Yes, that’s fair. (Though I’m not sure about the terms “domain-agnostic” and “domain-specific”; e.g., the AlphaZero approach seems to work well for a wide variety of board games played on “regular” boards but would need substantial modification to apply even to other board games and isn’t obviously applicable at all to anything that isn’t more or less a board game.)
And MuZero, which applies to ALE very well?
MuZero seems to deserve to be called domain-agnostic more than AlphaZero does, yes.
(For anyone else who doesn’t immediately recognize the abbreviation: ALE is the “Arcade Learning Environment”.)