I agree with DAL that “move 37” among the lesswrong-ish social circle has maybe become a handle for a concept where the move itself isn’t the best exemplar of that concept in reality, although I think it’s not a terrible exemplar either.
It was surprising to pro commentators at the time. People tended to conceptualize bots as being brute-force engines with human-engineered heuristics that just calculate them out to a massive degree (because historically that’s what they were in Chess), rather than as self-learning entities that excel in sensing vibes and intuition and holistic judgment. As a Go player, it looks to me like the kind of move that you never find just by brute force because there aren’t any critical tactics related to it to solve. The plausible variations and resulting shapes each variation produces are obvious, so it’s a move you choose only if you just have the intuition that you feel good having those resulting shapes being on the board in the long-term. KataGo’s raw policy prior for a recent net puts ~25% mass on the move, so it’s an “intuitive” option for the neural net too, not one discovered by deep search.
On the side of the move not being *too* remarkable, in the eyes of modern stronger bots the evaluation of the position doesn’t change much through that move or the 2-3 moves so it’s not like the game swung on that move. Lee Sedol’s response was also indeed fine, and both players do have other ways to play that are ~equally good, so the move also is not a unique good/best move. And there have since been surprises and things that have had a much bigger impact on pro human play since strong Go bots started becoming available.
Elaborating on the “self-learning entities that excel in vibes and intuition and holistic judgment”—modern Go bots relative to humans are fantastically good at judging and feeling out the best moves when the position is “smooth”, i.e. there are lots of plausible moves with tradeoffs that range through a continuum of goodness with no overly sharp tactics. But they are weak at calculating deep sharp tactics and solving variations that need to be proven precisely (still good and on-average-better than human, but it’s their weakest point). It’s still the case to this day that human pros can occasionally outcalculate the top bots in sharp/swingy tactical lines, while it’s unheard of for a human to outplay the bots through having better judgment in accumulating long-term advantages and making incremental good trades over the course of the game.
Bots excel at adapting their play extremely flexibly given subtle changes to the overall position, so commonly you get the bots suggesting interesting moves that are ever so slightly better on average, that human pro players might not consider. A lot of such moves also rate decently in the raw policy prior, so the neural nets are proposing many of these moves “on instinct” generalizing from their enormous volume of self-play learning, with the search serving after-the-fact to filter away the (also frequent) instances where the initial instinct is wrong and leads to a tactical blunder.
So, specific answers:
> Does understanding Move 37 require the use of extensive brute force search?
When you take into account the fact that AlphaGo had extensive search at its disposal, does that make the creativity of Move 37 significantly less impressive?
No, and brute force isn’t the practically relevant factor here so I’d question the premise. The variations and possible results that the move leads to aren’t too complicated, so the challenge is in the intuitive judgment call of whether those results are will be good over the next 50-100 moves of the game given the global situation, which I expect is beyond anyone’s ability to do solely via brute force (including bots). Pros at the time didn’t have the intuition that this kind of exchange in this kind of position could be good, so it was surprising. To the degree that modern pros could have a different intuition now, it would tend to be due to things like having shaped their subconscious intuition based on feedback and practice with modern bots and modern human post-AI playing styles, rather than mostly via conscious or verbalizable reasons.
> Is Move 37 categorically different from other surprising moves played by human Go experts?
Not particularly. Bots are superhuman in the kind of intuition that backs such moves, but I’d say it’s on a continuum. A top pro player might similarly find interesting situation-specific moves backed by intuition that most strong amateur players would not consider or have in those positions.
> I noticed that Lee Sedol’s Wikipedia page mentions a notable game in which he uses a “broken ladder,” which is “associated with beginner play”—maybe it’s not so uncommon for a professional Go player to do something unconventional every so often.
Given an expert explanation of Move 37, what level of Go expertise would be required to fully understand it, and how long would it take?
What if you had to figure it out without an explanation, just by studying the game?
Because it boils down to fuzzy overall intuition of what positions you prefer over others, it’s probably not the kind of move that can be verbally explained in any practical way in the first place. (It would be hard to give an explanation that’s “real” as opposed to merely curiosity-stopping or otherwise unuseful).
> To what extent have human players been able to learn novel strategies from AI in Go or chess?
The popularity of various opening patterns (“joseki”) has changed a lot. The 60-0 AlphaGo Master series featured many games that as far as bots were concerned were already very bad for the human by the time the opening was done, and I think that would not be as much the case if repeated today. But also I think that change is not so important. Small opening advantages are impactful at the level of top bots but for humans the variance in the rest of the game is large and makes that small difference matter much less. I’d guess the more important thing is the ability to use the bots as rapid and consistent feedback, i.e. just general practice and correcting one’s mistakes, rather than any big strategic change. This is the boring answer perhaps, because it’s also how bots have long been used in chess (but minus the the part about preparing exact opponent-specific opening lines, because Go’s opening is usually too open-ended to prepare specific variations and traps).
(Background: I’m the main developer of KataGo and have accumulated a lot of time looking at bot analysis of games and am a mid-amateur dan player, i.e. expert but not master, maybe would be around the top 15-30%ile of players if I were to attend the annual open US Go congress).
It’s interesting to note the variation in “personalities” and apparent expression of different emotions despite identical or very similar circumstances.
Pretraining gives models that predict every different kind of text on the internet, and so are very much simulators that learn to instantiate every kind of persona or text-generating process in that distribution, rather than being a single consistent agent. Subsequent RLHF and other training presumably vastly concentrates the distribution of personas and processes instantiated by the model on to a particular narrow cloud of personas that self-identifies as an AI with a particular name, has certain capabilities and quirks depending on that training, has certain claimed self-knowledge of capabilities (but where there isn’t actually very strong of a force tying the claimed self-knowledge to the actual capabilities), etc. But even narrowed, it’s interesting to still see significant variation within the remaining distribution of personas that gets sampled each new conversation, depending on the context.