[Again, I’m] not sure why GPT2 wouldn’t have a shot [at] stracraft or dota. The most basic fully connected network you could write, as long as it has enough parameters and the correct training environment, has shot at starcraft2, dota… etc.
Moving has a lot of degrees of freedom, as do those domains. There’s also the issue of quick response time (which is not something it was built for), and it not being an economical solution (which can also be said for OpenAI’s work in those areas).
When things built for starcraft don’t make it to the superhuman level, something that isn’t built for it probably won’t.
It’s just that it will learn slower than something [built] for those specific cases.
The question is how long − 10 years? Solving chess via analyzing the whole tree would take too much time, so no one does it. Would it learn in a remotely feasible amount of time?
The question is how long − 10 years? Solving chess via analyzing the whole tree would take too much time, so no one does it. Would it learn in a remotely feasible amount of time?
Well yeah, that’s my whole point here. We need to talk about accuracy and training time !
If the GPT-2 model was trained in a few hours, and losses 99% of games vs a decision tree based model (ala deep blue) that was trained in a few minutes on the same machine, then it’s worthless. It’s exactly like saying “In theory, given almost infinite RAM and 10 years we could beat deep blue (or alpha chess or whatever the cool kids are doing nowadays) by just analyzing a very large subset of all possible moves + combinations and arranging them hierarchically”.
So you think people should only be afraid/excited about developments in AGI that
1) are more recent than 50 to arguably 6 years ago
2) could do anything/a lot of things well with a reasonable amount of training time?
3) Or that might actually generalize in the sense of general artificial intelligence, that’s remotely close to being on par with humans (w.r.t ability to handle such a variety of domains)?
In regards to 1), I don’t necessarily think that older developments that are re-emerging can’t be interesting (see the whole RL scene nowadays, which to my understanding is very much bringing back the kind of approaches that were popular in the 70s). But I do think the particular ML development that people should focus on is the one with the most potential, which will likely end up being newer. My grips with GPT-2 is that there’s no comparative proof that it has potential to generalize compared to a lot of other things (e.g. quick architecture search methods, custom encoders/heads added to a resnet), actually I’d say the sheer size of it and the issue one encounters when training it indicates the opposite.
I don’t think 2) is a must, but going back to 1), I think that training time is one of the important criterions to compare the approaches we are focusing on. Since training time on a simple task is arguably the best you can do to understand training time for a more complex task.
As for 3) and 4)… I’d agree with 3), I think 4) is too vague, but I wasn’t trying to bring either point across in this specific post.
Moving has a lot of degrees of freedom, as do those domains. There’s also the issue of quick response time (which is not something it was built for), and it not being an economical solution (which can also be said for OpenAI’s work in those areas).
When things built for starcraft don’t make it to the superhuman level, something that isn’t built for it probably won’t.
The question is how long − 10 years? Solving chess via analyzing the whole tree would take too much time, so no one does it. Would it learn in a remotely feasible amount of time?
Well yeah, that’s my whole point here. We need to talk about accuracy and training time !
If the GPT-2 model was trained in a few hours, and losses 99% of games vs a decision tree based model (ala deep blue) that was trained in a few minutes on the same machine, then it’s worthless. It’s exactly like saying “In theory, given almost infinite RAM and 10 years we could beat deep blue (or alpha chess or whatever the cool kids are doing nowadays) by just analyzing a very large subset of all possible moves + combinations and arranging them hierarchically”.
So you think people should only be afraid/excited about developments in AGI that
1) are more recent than 50 to arguably 6 years ago
2) could do anything/a lot of things well with a reasonable amount of training time?
3) Or that might actually generalize in the sense of general artificial intelligence, that’s remotely close to being on par with humans (w.r.t ability to handle such a variety of domains)?
4) Seem actually agent-like.
In regards to 1), I don’t necessarily think that older developments that are re-emerging can’t be interesting (see the whole RL scene nowadays, which to my understanding is very much bringing back the kind of approaches that were popular in the 70s). But I do think the particular ML development that people should focus on is the one with the most potential, which will likely end up being newer. My grips with GPT-2 is that there’s no comparative proof that it has potential to generalize compared to a lot of other things (e.g. quick architecture search methods, custom encoders/heads added to a resnet), actually I’d say the sheer size of it and the issue one encounters when training it indicates the opposite.
I don’t think 2) is a must, but going back to 1), I think that training time is one of the important criterions to compare the approaches we are focusing on. Since training time on a simple task is arguably the best you can do to understand training time for a more complex task.
As for 3) and 4)… I’d agree with 3), I think 4) is too vague, but I wasn’t trying to bring either point across in this specific post.