I frequently use “Move 37” as a shorthand for “AI that comes up with creative, highly effective ideas that no human would ever consider.” Often the implication is that reinforcement learning (as used in AlphaGo) has some “secret sauce” that could never be replicated by imitation learning.
But I realize that I don’t know the details of Move 37 very well, other than secondhand accounts from Go experts of how “groundbreaking” it was. I’ve never played Go, and I have basically no knowledge of the rules or strategies beyond the most basic descriptions. Considering how influential Move 37 is on my views about AI, it seems like I’d better try to understand what was so special about it.
I’d be interested in an explanation that builds up the necessary understanding from the ground up. This could look like: “Read this tutorial on the rules of Go, study these wiki pages about specific concepts and strategies, look at these example games, and finally read my explanation of Move 37 which uses everything you’ve learned.”
Extremely ambitiously, after reading this explanation, I’d be able to look at a series of superficially similar Go boards, distinguish whether it might be a good idea to do a Move-37-like play, identify where exactly to move if so, and explain my answer. That may be unrealistic to achieve in a short time, but I’d be interested in getting as close as possible. An easier version of that challenge would use heavily-annotated Go boards that abstract away some parts of the necessary cognition, with notes like “this section of the board is very important to control” or “this piece has property A” or “these pieces are in formation B.”[1]
If part of the explanation is “when you do an extensive Monte Carlo Tree Search from this board state guided by XYZ heuristics, Move 37 turns out to be the best move,” that seems like a pretty good explanation to me—as long as the search tree is small enough that it plausibly could have been explored by AlphaGo during its match with Lee Sedol. I’m mainly interested in trying to understand the intuition behind Move 37 in the way AlphaGo might have “understood” it. If the move couldn’t be found by a human without using brute force search, that would be valuable to know.
I’m particularly interested in an explanation of Move 37 because I want to know whether such an explanation is even possible. When we have superintelligent AI solving real-world problems using strategies that no human would ever think of, those strategies should ideally be explainable, if not in practice, at least in principle—perhaps even to the point that a human could understand and replicate the strategies given enough time to study the explanation.[2]
Lee Sedol spent tens of thousands of hours studying Go, yet even he was flummoxed by Move 37 when he first saw it, spending nearly 15 minutes to come up with a response. Maybe it’s hubris to hope that a complete novice like me could understand anything about it, but I’d be surprised if it weren’t possible to get some intuition for why this move was important. I’m sure it’s very difficult to become an expert in quantum computing, and even harder to discover it from scratch, but it’s possible to get a (vague, no doubt flawed) understanding of Grover’s algorithm from a 30-minute YouTube video. I generally expect the curve of understanding vs. effort spent to be relatively smooth, even in very difficult domains.
I think it’s plausible that requiring AI strategies to meet some minimum bar for explainability won’t necessarily incur a huge safety tax. So far, it seems like most AI-discovered strategies are not incomprehensible to humans, given a proper explanation.[3] Move 37 is the closest thing we have to a counterexample—a strategy that initially seemed alien even to top human experts—so learning more about it would help me evaluate this hypothesis.
I’d be willing to pay for a thorough written explanation of Move 37—likely $50, maybe up to $100 for an extremely high-quality explanation. I’d be willing to spend up to 8 hours studying, but ideally, the explanation would be accessible enough for a random LessWrong reader to glean something useful from it in 30 minutes.
Regardless of whether I can successfully understand Move 37 at a low level, I’d be interested in answering high-level questions like the following:
Does understanding Move 37 require the use of extensive brute force search?
When you take into account the fact that AlphaGo had extensive search at its disposal, does that make the creativity of Move 37 significantly less impressive?
Is Move 37 categorically different from other surprising moves played by human Go experts?
I noticed that Lee Sedol’s Wikipedia page mentions a notable game in which he uses a “broken ladder,” which is “associated with beginner play”—maybe it’s not so uncommon for a professional Go player to do something unconventional every so often.
Given an expert explanation of Move 37, what level of Go expertise would be required to fully understand it, and how long would it take?
What if you had to figure it out without an explanation, just by studying the game?
To what extent have human players been able to learn novel strategies from AI in Go or chess?[4]
Broadly speaking, how is an advanced AI’s “playstyle” different from advanced human players?
- ^
I don’t necessarily want to actually take a test like this since it seems like it would be hard to make, but I hope this description gives you a better idea of what I’m going for.
- ^
At this point, I started writing a footnote about two different types of explanations we might try to elicit from the AI. I ended up turning the footnote into a full post: Procedural vs. Causal Understanding.
- ^
After some research, I found some more examples of “creative AI behavior” that are pretty similar to Move 37, involving novel solutions that no human had previously thought of. However, these examples have important differences, or are so similar to Move 37 that I don’t think learning about them would teach me much more (e.g. novel chess strategies found by AI).
AlphaFold’s ability to predict protein folding is probably the best example of AI intuitions totally outstripping humans. However, it seems pretty different from AlphaGo in that there are no “expert human protein-folding predictors.” It’s plausible to me that humans who studied protein folding as diligently as Lee Sedol studied Go, learning from centuries of accumulated human knowledge, would be able to compete with AlphaFold. Even if AlphaFold beat these hypothetical humans, there likely exists some explanation that would let them understand the AI’s solutions.
Other AI-discovered strategies are likely pretty easy to understand.
AlphaEvolve is a very recent example of AI coming up with new solutions to mathematical problems. However, AlphaEvolve’s edge over human mathematicians seems to come from working at various solutions for a very long time, rather than some special insight that only an AI could have. AlphaEvolve simply uses Gemini 2.0 to generate many variations of high-scoring solutions, without doing any specialized RL training. Since Gemini 2.0′s training most likely doesn’t involve any multi-step RL, the explanations for its solutions are probably entirely comprehensible to humans.
OpenAI Five, a 2018 AI that played Dota 2, “deviated from current playstyle in a few areas, such as giving support heroes (which usually do not take priority for resources) lots of early experience and gold.” I’m not sure what other strategies it used, but the single mentioned strategy seems very straightforward to understand.
From The Verge in 2019, reporting on AlphaStar, a pretty similar AI that plays StarCraft:
“AlphaStar is an intriguing and unorthodox player — one with the reflexes and speed of the best pros but strategies and a style that are entirely its own. The way AlphaStar was trained, with agents competing against each other in a league, has resulted in gameplay that’s unimaginably unusual; it really makes you question how much of StarCraft’s diverse possibilities pro players have really explored,” Diego “Kelazhur” Schwimer, a pro player for team Panda Global, said in a statement. “Though some of AlphaStar’s strategies may at first seem strange, I can’t help but wonder if combining all the different play styles it demonstrated could actually be the best way to play the game.”
This explanation of AlphaStar’s strategies is even more vague than the OpenAI Five explanation, though it sounds intriguing. If OpenAI Five or AlphaStar ever did come up with any truly incomprehensible superhuman strategies, it’s probably very difficult to find out now.
- ^
Apparently, chess grandmasters were able to learn some strategies from AlphaZero.
It’s possible for something to be a useful shorthand even if the underlying facts are dubious (e.g., the “let them eat cake” line doesn’t come from Marie Antoinette but nonetheless illuminates the situation at the time; frogs will jump out of water if you heat it gradually but this stands in for a useful concept).
I’m not an expert-level Go player but my general sense is that Move 37 is in this same category. It was a surprising move, but it had a limited impact on the match and was not an optimal move as scored by stronger contemporary Go engines (thought it was a very good one). It didn’t shift the probability of victory, and Sedol’s move 38 was the optimal response to it as scored by Katago. It seems to have had a psychological effect because it was so surprising, but that’s possible even if a move is literally random (as famously happened with Kasparov and Deep Blue).
You can donwload Katago and work through this yourself.
I wanted to comment on this, not because I think it’ll answer your questions, but because it lies at the center of my interests, and you nerd-sniped me. Anyway, sorry for just now getting around to it.
I’m longtime (pretty average) go player, and interested in things like ‘intelligence’ and ‘consciousness’ (in humans really, but lately computers are also interesting)
I watched the Lee Sedol live streams, and had a lot of fun, learning about human nature.
I actually gave a very short seminar talk about them and related things once, because to me they are a very distilled example of something that I also noticed in your post.
Anyway, here goes.
What does ‘creative’ mean? Which humans? All of them, even both the pros, and the village idiots?
Why is this move so important to you? Why do people buy overpriced sneakers? Because they heard other people talking about them?
Regarding the understanding part:
I think it is important to distinguish between different time frames. e.g. explanations that come before the event, or, minutes, days, or years after etc.
For example, before AlphaGo, it was common knowledge, that computers will not beat Go for a long time, because they have no ‘intuition’. (Similar to how they would never create art, because they had no creativity, etc.)
They first thing I learned here, is that people (apart from mathematicians, and a small number of physicists) never define their terms properly. (Neither did you, above)
Now, afterwards, since Lee Sedol lost, Alphago necessarily has to have intuition or creativity, or whatever was missing before. But this does not mean anything. I guess it’s called Wittgenstein’s ruler. You are not measuring the capabilities of AlphaGo relative to your (well defined) concepts, but your (ill-defined) concepts get retrofitted and filled with meaning by what AlphaGo did.
For an example at what this looks like in action, please see the game itself:
From that timestamp, for about a minute or two.
There are two commentators, which for live TV Go games is the standard format. In this case, on the right, Redmond is the ‘smartest person in the room’, and on the left, the ‘jester’, who acts the fool and asks questions, which Redmond answers for the benefit of the audience. Without them, a normal person (and many players) would not be able to understand what the fuck is happening, and who is winning (even for any other televised game). Anyway, the smart one normally is of the same level as the players, and really the only person who can follow all the details. They always understand what is going on. Not here though.
Watch Redmond. He places the stone and then just moves it somewhere else, because obviously it’s wrong there and has to have been a ‘misclick’ by the human inputting the moves. It resembles a commonly known shape, a ‘shoulder hit’, but is farther from the edge than normally.
Anyway, they and other commentators go from ‘misclick’ to ‘I don’t understand’ to ‘this is known to be bad’, to ’huh whadda ya know’
What I’m trying to say is that the smartest people in the room just make mouth noises since their heuristics no longer work. Normally, whatever the commentator says is law, since there is no one stronger present to correct them.
Again, here you don’t learn anything about good Go moves (Since no one present understands at that level). You learn about human experts learning in real time.
Your whole post is full of sentences like this. What does ‘intuition’ mean? I think you are going at it wrong. AlphaGo just ‘knows’ more about Go (See, now I did it). There are no shortcuts. Out with the old (knowledge). In with the new.
Egyptians did not know how to draw using the correct perspective, and now we do. They didn’t know how to do complex quantum chemical calculations either, and now we can. Still, one is more difficult (in invested energy) than the other, and I don’t see how to distinguish one from the other a priori.
I don’t think the move was important, to be honest. The event, the demonstration, the technological leap certainly was. As in someone dying today versus tomorrow. The date is not the important part. The dying versus not/never dying is.
Again, your focus (to me) seems to be on the less important things.
For example, if you look at the comments on LLM technology, have in mind the different timescales. Comments on fresh new things are different than comments that come years later. And you can learn a lot from how people comment on these things. Just like when Redmond automatically moved the stone, because he ‘knew’ where it should really be.
brute? no, extensive, yes. Humans (playing many many games) found many recurring useful patterns, but it seems not all.
It makes it more probable, which is why it found it. the word ‘creativity’ has to be defined before this question can be answered.
That game is a good example. It’s basically a tradeoff. A professional would not typically end up in that situation, because the search tree involves a few ‘obviously bad’ moves before one profits. Maybe as in ‘two wrongs make it right’, and the pro just truncates the search after the first wrong if you will.
Still, it’s easily understandable after the fact, or with very good foresight. it’s just that it’s maybe outside your normal heuristics.
There are only ‘just-so’, handwaving after-the-fact explanations (of the type seen in the video). There is no absolute truth (yet). The next iteration of Go playing robot will invalidate them, just as AlphaGo did (until the game is eventually completely solved)
To understand you just have to play (a lot in this case). Otherwise you’ll just be like someone that wants to learn Parcour or Kung Fu by watching Youtube videos, and never once moving a muscle. Metis vs Episteme I guess.
There are many new moves that entered common knowledge, particularly in the opening. For example, this one comes up a lot in games at my level and has changed due to alphaGo
https://www.josekipedia.com/#path:pdqcqdpcocobncnbmc
As a result, other variations basically disappeared completely. but not because they would loose you the game, but because we have been told they are now bad, and we (average amateurs) simply lack the skill to understand every nuance.
I guess people called it alien or strange, or other words, but really all these words mean ‘I don’t (yet) understand why this move is good’. Once they do, AI will just play ‘normal’ again.
Once human Metis catches up (at least in things like openings, where few moves are involved), thing cool down again.
In the end it’s just people gossiping. You learn not by memorizing what they say, but by understanding why they say it, or why others stay silent.
I agree with DAL that “move 37” among the lesswrong-ish social circle has maybe become a handle for a concept where the move itself isn’t the best exemplar of that concept in reality, although I think it’s not a terrible exemplar either.
It was surprising to pro commentators at the time. People tended to conceptualize bots as being brute-force engines with human-engineered heuristics that just calculate them out to a massive degree (because historically that’s what they were in Chess), rather than as self-learning entities that excel in sensing vibes and intuition and holistic judgment. As a Go player, it looks to me like the kind of move that you never find just by brute force because there aren’t any critical tactics related to it to solve. The plausible variations and resulting shapes each variation produces are obvious, so it’s a move you choose only if you just have the intuition that you feel good having those resulting shapes being on the board in the long-term. KataGo’s raw policy prior for a recent net puts ~25% mass on the move, so it’s an “intuitive” option for the neural net too, not one discovered by deep search.
On the side of the move not being *too* remarkable, in the eyes of modern stronger bots the evaluation of the position doesn’t change much through that move or the 2-3 moves so it’s not like the game swung on that move. Lee Sedol’s response was also indeed fine, and both players do have other ways to play that are ~equally good, so the move also is not a unique good/best move. And there have since been surprises and things that have had a much bigger impact on pro human play since strong Go bots started becoming available.
Elaborating on the “self-learning entities that excel in vibes and intuition and holistic judgment”—modern Go bots relative to humans are fantastically good at judging and feeling out the best moves when the position is “smooth”, i.e. there are lots of plausible moves with tradeoffs that range through a continuum of goodness with no overly sharp tactics. But they are weak at calculating deep sharp tactics and solving variations that need to be proven precisely (still good and on-average-better than human, but it’s their weakest point). It’s still the case to this day that human pros can occasionally outcalculate the top bots in sharp/swingy tactical lines, while it’s unheard of for a human to outplay the bots through having better judgment in accumulating long-term advantages and making incremental good trades over the course of the game.
Bots excel at adapting their play extremely flexibly given subtle changes to the overall position, so commonly you get the bots suggesting interesting moves that are ever so slightly better on average, that human pro players might not consider. A lot of such moves also rate decently in the raw policy prior, so the neural nets are proposing many of these moves “on instinct” generalizing from their enormous volume of self-play learning, with the search serving after-the-fact to filter away the (also frequent) instances where the initial instinct is wrong and leads to a tactical blunder.
So, specific answers:
> Does understanding Move 37 require the use of extensive brute force search?
When you take into account the fact that AlphaGo had extensive search at its disposal, does that make the creativity of Move 37 significantly less impressive?
No, and brute force isn’t the practically relevant factor here so I’d question the premise. The variations and possible results that the move leads to aren’t too complicated, so the challenge is in the intuitive judgment call of whether those results are will be good over the next 50-100 moves of the game given the global situation, which I expect is beyond anyone’s ability to do solely via brute force (including bots). Pros at the time didn’t have the intuition that this kind of exchange in this kind of position could be good, so it was surprising. To the degree that modern pros could have a different intuition now, it would tend to be due to things like having shaped their subconscious intuition based on feedback and practice with modern bots and modern human post-AI playing styles, rather than mostly via conscious or verbalizable reasons.
> Is Move 37 categorically different from other surprising moves played by human Go experts?
Not particularly. Bots are superhuman in the kind of intuition that backs such moves, but I’d say it’s on a continuum. A top pro player might similarly find interesting situation-specific moves backed by intuition that most strong amateur players would not consider or have in those positions.
> I noticed that Lee Sedol’s Wikipedia page mentions a notable game in which he uses a “broken ladder,” which is “associated with beginner play”—maybe it’s not so uncommon for a professional Go player to do something unconventional every so often.
Given an expert explanation of Move 37, what level of Go expertise would be required to fully understand it, and how long would it take?
What if you had to figure it out without an explanation, just by studying the game?
Because it boils down to fuzzy overall intuition of what positions you prefer over others, it’s probably not the kind of move that can be verbally explained in any practical way in the first place. (It would be hard to give an explanation that’s “real” as opposed to merely curiosity-stopping or otherwise unuseful).
> To what extent have human players been able to learn novel strategies from AI in Go or chess?
The popularity of various opening patterns (“joseki”) has changed a lot. The 60-0 AlphaGo Master series featured many games that as far as bots were concerned were already very bad for the human by the time the opening was done, and I think that would not be as much the case if repeated today. But also I think that change is not so important. Small opening advantages are impactful at the level of top bots but for humans the variance in the rest of the game is large and makes that small difference matter much less. I’d guess the more important thing is the ability to use the bots as rapid and consistent feedback, i.e. just general practice and correcting one’s mistakes, rather than any big strategic change. This is the boring answer perhaps, because it’s also how bots have long been used in chess (but minus the the part about preparing exact opponent-specific opening lines, because Go’s opening is usually too open-ended to prepare specific variations and traps).
(Background: I’m the main developer of KataGo and have accumulated a lot of time looking at bot analysis of games and am a mid-amateur dan player, i.e. expert but not master, maybe would be around the top 15-30%ile of players if I were to attend the annual open US Go congress).