I don’t think this was unexpected at all. As soon as Deepmind announced their Starcraft project, most of the discussion was about proper mechanical limitations since the real-time-aspect of RTS games favors mechanical execution so heavily. Being dumb and fast is simply more effective than smart and slow.
The skills that make a good human Stracraft player can broadly be divided into two categories: athleticism and intelligence. Much of the strategy in the game is build around the fact that players are playing with limited resources of athleticism (i.e. speed and accuracy) so it follows that you can’t necessarily separate the two skill categories and only measure one of them.
The issue with the presentation was that not only did Deepmind not highlight the problematic nature of assessing the intelligence of their algorithm, they actively downplayed it. In my opinion, the pr spin was blatantly obvious and the community backlash warranted and justified.
Being dumb and fast is simply more effective than smart and slow.
But it is unclear what the trade-off actually is here, and what it means to be “fast” or “smart”. AI that is really dumb and really fast has been around for a while, but it hasn’t been able to beat human experts in a full 1v1 match.
Much of the strategy in the game is build around the fact that players are playing with limited resources of athleticism (i.e. speed and accuracy) so it follows that you can’t necessarily separate the two skill categories and only measure one of them.
The fact that strategy is developed under an athleticism constraint does not imply that we can’t measure athleticism. What was unexpected (at least to me) is that, even with a full list of commands given by the players, it is hard to arrive at a reasonable value for just the speed component(s) of this constraint. It seems like this was expected, at least by some people. But most of the discussion that I saw about mechanical limitations seemed to suggest that we just need to turn the APM dial to the right number, add in some misclicking and reaction time, and call it a day. Most of the people involved in this discussion had greater expertise than I do in SCII or ML or both, so I took this pretty seriously. But it turns out you can’t even get close to human-like interaction with the game without at least two or three parameters for speed alone.
Sorry I worded that really poorly. Dumb and fast was a comment about relatively high-level human play. It is context dependend and as you said, the trade off is very hard to measure. It probably flips back and forth quite a bit if we’d slowly increase both and actually attempt to graph it. Point is, If we look at the early game, where both players have similar armies, unlimited athleticism quickly becomes unbeatable even with only moderate intelligence behind it.
The thing about measuring athleticism or intelligence separately is that we can measure athleticism of a machine but not of a human. When a human plays sc2 it’s never about purely executing a mindless task. Never. You’d have to somehow separate the visual recogniton component which is impossible. Human reaction times and accuracy are heavily affected by the dynamically changing scene of play.
Think about it this way, measuring human spam clicking speed and accuracy is not the benchmark because those actions are inconsequential and don’t translate to combat movement (or any other actions a player makes in a dynamic scene). Say you are in a blink stalker battle. In order to effectively retreat wounded units you have to quickly assess which units are in danger before ordering them to pull back. That cognitive process of visual recogniton and anticipation is simply inseparable of the athleticism aspect.
I guess you could measure human clicking speed and reaction times in a program specifically designed to do so but those measures would be useless for the same reason. The mechanically ability of the human varies wildly based on what is happening in a game of sc2. There are cognitive bottlenecks.
Here’s an even clearer way to think about it. In a game of soccer you can make a decision to run somewhere (intelligence) and then try to run as fast as you can (athleticism). In a game of starcraft every actions is a click and therefore a decision. You can’t click harder or gentler. You could argue that a single decision can include dozens of clicks but that’s true only for macrostrategic decisions (e.g. what build order a player chooses). Those don’t exist in combat situations.
Basically, we can handicap the AI mechanically exactly where we want it but we can’t know for sure where that is. Luckily we don’t have to. We can simply eyeball it and shoot intentionally slightly lower. That way, if the human is on equal footing or even has a slight edge, an AI victory should almost inarguably be a result of superior cognitive ability.
You don’t have to get these handicaps exactly right. The APM controversy happened because AS’s advantages were obvious. It is not hard to make it less so.
I think there are two perspectives to view the mechanical constraints put on AlphaStar:
One is the “fairness” perspective, which is that the constraints should perfectly mirror that of a human player, be it effective APM, reaction time, camera control, clicking accuracy etc. This is the perspective held mostly by the gaming community, but it is difficult to implement in practice as shown by this post, requiring enormous analysis and calibration effort.
The other is what I call the “aesthetics” perspective, which is that the constraints should be used to force the AI into a rich strategy space where its moves are gratifying to watch and interesting to analyze. The constraints can be very asymmetrical with respect to human constraints.
In retrospect, I think the second one is what they should have gone with, because there is a single constraint could have achieved it: signal delay
Think about it: what good would arbitrarily high APM and clicking accuracy amount to if the ping is 400-500ms?
It would naturally introduce uncertainties through imperfect predictions and bias towards longer-term thinking anywhere on the timescale from seconds to minutes
It would naturally move the agent into the complex strategy space that was purposefully designed into the game but got circumvented by exploiting certain edge cases like ungodly blink stalker micro
It avoids painstaking analysis of the multi-dimensional constraint-space by reducing it down to a single variable
The interesting parts of the strategy space were not designed in even for human players. There is a lot of promoting bugs to features and player creative effort that has shaped the balance. There is a certain game the game designers and player play. Players try to abuse every edge and designers try to keep the game interesting and balanced. Forbidding AI to find it’s own edge cases would impose a differnt incentive structure than humans deal with.
This is not true. In Starcraft Broodwar there are lot’s of bugs that players take advantage of but such bugs don’t exist in Starcraft 2.
I think it’s much more important to restrict the AI mechanically so that it has to succeed strategically that to have a fair fight. The whole conversation about fairness is misguided anyway. The point of APM limiter is to remove confounding factors and increase validity of our measurement, not to increase fairness.
Here to say the same thing.
Say I want to discover better strategies in SC2 using AlphaStar, it’s extremely important that Alphastar be employing some arbitrarily low human achievable level of athleticism.
I was disappointed when the vs TLO videos came out that TLO thought he was playing against one agent AlphaStar. But in fact he played against five different agents which employed five different strategies, not a single agent which was adjusting and choosing among a broad range of strategies.
In making of starcraft 2 there was the issue of what mechanics to carry over from sc1. If a mechanic that is kept is a ascended bug you off course provide a clean implementation so the new games mechanic is not a bug. But it still means that the mechanic was not put in the palette by a human even if a human decides to keep it for the next game. The complex strategy spaces are discovered and proven rather than built or designed. If the game doesn’t play like it was designed but is not broken then it tends to not get fixed. In reverse if a designers balance doesn’t result in a good meta in the wild the onus is on the designers to introduce a patch that actually results in a healthy meta and not make players play in a specific way to keep the game working.
It’s all good; thanks for clarifying. I probably could have read more charitably. :)
That cognitive process of visual recogniton and anticipation is simply inseparable of the athleticism aspect.
Yeah, I get what you’re saying. To me, the quick recognition and anticipation feels more like athleticism anyway. We’re impressed with athletes that can react quickly and anticipate their opponent’s moves, but I’m not sure we think of them as “smart” while they’re doing this.
This is part of what I was trying to look at by measuring APM while in combat. But I think you’re right that there is no sharp divide between “strategy” or being “smart” or “clever” and “speed” or being “fast” or “accurate”.
I don’t think this was unexpected at all. As soon as Deepmind announced their Starcraft project, most of the discussion was about proper mechanical limitations since the real-time-aspect of RTS games favors mechanical execution so heavily. Being dumb and fast is simply more effective than smart and slow.
The skills that make a good human Stracraft player can broadly be divided into two categories: athleticism and intelligence. Much of the strategy in the game is build around the fact that players are playing with limited resources of athleticism (i.e. speed and accuracy) so it follows that you can’t necessarily separate the two skill categories and only measure one of them.
The issue with the presentation was that not only did Deepmind not highlight the problematic nature of assessing the intelligence of their algorithm, they actively downplayed it. In my opinion, the pr spin was blatantly obvious and the community backlash warranted and justified.
But it is unclear what the trade-off actually is here, and what it means to be “fast” or “smart”. AI that is really dumb and really fast has been around for a while, but it hasn’t been able to beat human experts in a full 1v1 match.
The fact that strategy is developed under an athleticism constraint does not imply that we can’t measure athleticism. What was unexpected (at least to me) is that, even with a full list of commands given by the players, it is hard to arrive at a reasonable value for just the speed component(s) of this constraint. It seems like this was expected, at least by some people. But most of the discussion that I saw about mechanical limitations seemed to suggest that we just need to turn the APM dial to the right number, add in some misclicking and reaction time, and call it a day. Most of the people involved in this discussion had greater expertise than I do in SCII or ML or both, so I took this pretty seriously. But it turns out you can’t even get close to human-like interaction with the game without at least two or three parameters for speed alone.
Sorry I worded that really poorly. Dumb and fast was a comment about relatively high-level human play. It is context dependend and as you said, the trade off is very hard to measure. It probably flips back and forth quite a bit if we’d slowly increase both and actually attempt to graph it. Point is, If we look at the early game, where both players have similar armies, unlimited athleticism quickly becomes unbeatable even with only moderate intelligence behind it.
The thing about measuring athleticism or intelligence separately is that we can measure athleticism of a machine but not of a human. When a human plays sc2 it’s never about purely executing a mindless task. Never. You’d have to somehow separate the visual recogniton component which is impossible. Human reaction times and accuracy are heavily affected by the dynamically changing scene of play.
Think about it this way, measuring human spam clicking speed and accuracy is not the benchmark because those actions are inconsequential and don’t translate to combat movement (or any other actions a player makes in a dynamic scene). Say you are in a blink stalker battle. In order to effectively retreat wounded units you have to quickly assess which units are in danger before ordering them to pull back. That cognitive process of visual recogniton and anticipation is simply inseparable of the athleticism aspect.
I guess you could measure human clicking speed and reaction times in a program specifically designed to do so but those measures would be useless for the same reason. The mechanically ability of the human varies wildly based on what is happening in a game of sc2. There are cognitive bottlenecks.
Here’s an even clearer way to think about it. In a game of soccer you can make a decision to run somewhere (intelligence) and then try to run as fast as you can (athleticism). In a game of starcraft every actions is a click and therefore a decision. You can’t click harder or gentler. You could argue that a single decision can include dozens of clicks but that’s true only for macrostrategic decisions (e.g. what build order a player chooses). Those don’t exist in combat situations.
Basically, we can handicap the AI mechanically exactly where we want it but we can’t know for sure where that is. Luckily we don’t have to. We can simply eyeball it and shoot intentionally slightly lower. That way, if the human is on equal footing or even has a slight edge, an AI victory should almost inarguably be a result of superior cognitive ability.
You don’t have to get these handicaps exactly right. The APM controversy happened because AS’s advantages were obvious. It is not hard to make it less so.
I think there are two perspectives to view the mechanical constraints put on AlphaStar:
One is the “fairness” perspective, which is that the constraints should perfectly mirror that of a human player, be it effective APM, reaction time, camera control, clicking accuracy etc. This is the perspective held mostly by the gaming community, but it is difficult to implement in practice as shown by this post, requiring enormous analysis and calibration effort.
The other is what I call the “aesthetics” perspective, which is that the constraints should be used to force the AI into a rich strategy space where its moves are gratifying to watch and interesting to analyze. The constraints can be very asymmetrical with respect to human constraints.
In retrospect, I think the second one is what they should have gone with, because there is a single constraint could have achieved it: signal delay
Think about it: what good would arbitrarily high APM and clicking accuracy amount to if the ping is 400-500ms?
It would naturally introduce uncertainties through imperfect predictions and bias towards longer-term thinking anywhere on the timescale from seconds to minutes
It would naturally move the agent into the complex strategy space that was purposefully designed into the game but got circumvented by exploiting certain edge cases like ungodly blink stalker micro
It avoids painstaking analysis of the multi-dimensional constraint-space by reducing it down to a single variable
The interesting parts of the strategy space were not designed in even for human players. There is a lot of promoting bugs to features and player creative effort that has shaped the balance. There is a certain game the game designers and player play. Players try to abuse every edge and designers try to keep the game interesting and balanced. Forbidding AI to find it’s own edge cases would impose a differnt incentive structure than humans deal with.
This is not true. In Starcraft Broodwar there are lot’s of bugs that players take advantage of but such bugs don’t exist in Starcraft 2.
I think it’s much more important to restrict the AI mechanically so that it has to succeed strategically that to have a fair fight. The whole conversation about fairness is misguided anyway. The point of APM limiter is to remove confounding factors and increase validity of our measurement, not to increase fairness.
Here to say the same thing. Say I want to discover better strategies in SC2 using AlphaStar, it’s extremely important that Alphastar be employing some arbitrarily low human achievable level of athleticism.
I was disappointed when the vs TLO videos came out that TLO thought he was playing against one agent AlphaStar. But in fact he played against five different agents which employed five different strategies, not a single agent which was adjusting and choosing among a broad range of strategies.
In making of starcraft 2 there was the issue of what mechanics to carry over from sc1. If a mechanic that is kept is a ascended bug you off course provide a clean implementation so the new games mechanic is not a bug. But it still means that the mechanic was not put in the palette by a human even if a human decides to keep it for the next game. The complex strategy spaces are discovered and proven rather than built or designed. If the game doesn’t play like it was designed but is not broken then it tends to not get fixed. In reverse if a designers balance doesn’t result in a good meta in the wild the onus is on the designers to introduce a patch that actually results in a healthy meta and not make players play in a specific way to keep the game working.
It’s all good; thanks for clarifying. I probably could have read more charitably. :)
Yeah, I get what you’re saying. To me, the quick recognition and anticipation feels more like athleticism anyway. We’re impressed with athletes that can react quickly and anticipate their opponent’s moves, but I’m not sure we think of them as “smart” while they’re doing this.
This is part of what I was trying to look at by measuring APM while in combat. But I think you’re right that there is no sharp divide between “strategy” or being “smart” or “clever” and “speed” or being “fast” or “accurate”.