So you think people should only be afraid/excited about developments in AGI that
1) are more recent than 50 to arguably 6 years ago
2) could do anything/a lot of things well with a reasonable amount of training time?
3) Or that might actually generalize in the sense of general artificial intelligence, that’s remotely close to being on par with humans (w.r.t ability to handle such a variety of domains)?
In regards to 1), I don’t necessarily think that older developments that are re-emerging can’t be interesting (see the whole RL scene nowadays, which to my understanding is very much bringing back the kind of approaches that were popular in the 70s). But I do think the particular ML development that people should focus on is the one with the most potential, which will likely end up being newer. My grips with GPT-2 is that there’s no comparative proof that it has potential to generalize compared to a lot of other things (e.g. quick architecture search methods, custom encoders/heads added to a resnet), actually I’d say the sheer size of it and the issue one encounters when training it indicates the opposite.
I don’t think 2) is a must, but going back to 1), I think that training time is one of the important criterions to compare the approaches we are focusing on. Since training time on a simple task is arguably the best you can do to understand training time for a more complex task.
As for 3) and 4)… I’d agree with 3), I think 4) is too vague, but I wasn’t trying to bring either point across in this specific post.
So you think people should only be afraid/excited about developments in AGI that
1) are more recent than 50 to arguably 6 years ago
2) could do anything/a lot of things well with a reasonable amount of training time?
3) Or that might actually generalize in the sense of general artificial intelligence, that’s remotely close to being on par with humans (w.r.t ability to handle such a variety of domains)?
4) Seem actually agent-like.
In regards to 1), I don’t necessarily think that older developments that are re-emerging can’t be interesting (see the whole RL scene nowadays, which to my understanding is very much bringing back the kind of approaches that were popular in the 70s). But I do think the particular ML development that people should focus on is the one with the most potential, which will likely end up being newer. My grips with GPT-2 is that there’s no comparative proof that it has potential to generalize compared to a lot of other things (e.g. quick architecture search methods, custom encoders/heads added to a resnet), actually I’d say the sheer size of it and the issue one encounters when training it indicates the opposite.
I don’t think 2) is a must, but going back to 1), I think that training time is one of the important criterions to compare the approaches we are focusing on. Since training time on a simple task is arguably the best you can do to understand training time for a more complex task.
As for 3) and 4)… I’d agree with 3), I think 4) is too vague, but I wasn’t trying to bring either point across in this specific post.