DeepMind released their AlphaStar paper a few days ago, having reached Grandmaster level at the partial-information real-time strategy game StarCraft II over the summer.
This is very impressive, and yet less impressive than it sounds. I used to watch a lot of StarCraft II (I stopped interacting with Blizzard recently because of how they rolled over for China), and over the summer there were many breakdowns of AlphaStar games once players figured out how to identify the accounts.
The impressive part is getting reinforcement learning to work at all in such a vast state space- that took breakthroughs beyond what was necessary to solve Go and beat Atari games. AlphaStar had to have a rich enough set of potential concepts (in the sense that e.g. a convolutional net ends up having concepts of different textures) that it could learn a concept like “construct building P” or “attack unit Q” or “stay out of the range of unit R” rather than just “select spot S and enter key T”. This is new and worth celebrating.
The overhyped part is that AlphaStar doesn’t really do the “strategy” part of real-time strategy. Each race has a few solid builds that it executes at GM level, and the unit control is fantastic, but the replays don’t look creative or even especially reactive to opponent strategies.
That’s because there’s no representation of causal thinking—“if I did X then they could do Y, so I’d better do X’ instead”. Instead there are many agents evolving together, and if there’s an agent evolving to try Y then the agents doing X will be replaced with agents that do X’.
(This lack of causal reasoning especially shows up in building placement, where the consequences of locating any one building here or there are minor, but the consequences of your overall SimCity are major for how your units and your opponents’ units would fare if they attacked you. In one comical case, AlphaStar had surrounded the units it was building with its own factories so that they couldn’t get out to reach the rest of the map. Rather than lifting the buildings to let the units out, which is possible for Terran, it destroyed one building and then immediately began rebuilding it before it could move the units out!)
This means that, first, AlphaStar just doesn’t have a decent response to strategies that it didn’t evolve, and secondly, it doesn’t do much in the way of a reactive decision tree of strategies (if I scout this, I do that). That kind of play is unfortunately very necessary for playing Zerg at a high level, so the internal meta has just collapsed into one where its Zerg agents predictably rush out early attacks that are easy to defend if expected. This has the flow-through effect that its Terran and Protoss are weaker against human Zerg than against other races, because they’ve never practiced against a solid Zerg that plays for the late game.
The end result cleaned up against weak players, performed well against good players, but practically never took a game against the top few players. I think that DeepMind realized they’d need another breakthrough to do what they did to Go, and decided to throw in the towel while making it look like they were claiming victory.
Finally, RL practitioners have known that genuine causal reasoning could never be achieved via known RL architectures- you’d only ever get something that could execute the same policy as an agent that had reasoned that way, via a very expensive process of evolving away from dominated strategies at each step down the tree of move and countermove. It’s the biggest known unknown on the way to AGI.
I expect that some otherwise convinceable readers are not going to realize that in this fictional world, people haven’t discovered Newton’s physics or calculus, and those readers are therefore going to miss the analogy of “this is how MIRI would talk about the situation if they didn’t already know the fundamental concepts but had reasons for searching in the right direction”. (I’m not thinking of readers incapable of handling that counterfactual, but of readers who aren’t great at inferring implicit background facts from a written dialogue. Such readers might get very confused at the unexpected turns of the dialogue and quit rather than figure out what they’re baffled by.)
I’d suggest adding to the preamble something like “In a weird world where people had figured out workable aeronautics and basic rocket propulsion by trial and error, but hadn’t figured out Newton’s laws or calculus”.
Why not call the e-book “The Methods of Rationality”?
Or maybe something that is clearly not HPMoR, but clearly connected to it.
Quick approximate summary:
John von Neumann first says, essentially, that the Industrial Revolution has made the world smaller, and that in earlier centuries the problems caused by it were contained to nations, but that now they extend to the entire globe.
For a first example, he talks about cheap energy and industrial synthesis, though he predicts that nuclear fusion and transmutation would be much more practically feasible than they have turned out to be.
He briefly mentions expected major improvement in automation, communication, and transportation.
He then talks about anthropogenic climate change and the broad possibilities of geoengineering the climate.
All of these technologies can vastly improve human life, or destroy it.
He immediately rules out the “solution” of [preventing advances in technology] as both undesirable (because it blocks the positive uses) and impossible (because it would require total coordination and a total change in human values).
He next considers the possibility of permanently avoiding war through diplomacy etc, and does not think that the 1950s drive for world peace will last long; furthermore, such an initiative would need to adapt to ever-more-powerful technologies as fast as they are introduced.
He frames the upcoming decades as a dangerous but useful evolution, where we will either succeed or fail catastrophically, and doesn’t sound especially optimistic. Our best hope is to innovate new political forms that are capable of handling major threats with patience, flexibility, and intelligence.