avturchin comments on OpenAI announces GPT-3

avturchin 29 May 2020 19:53 UTC
39 points
0
A postmortem of my predictions about GPT-3 from 21 March 2019:
When it will appear? (My guess is 2020). True
Will it be created by OpenAI and will it be advertised? (My guess is that it will not be publicly known until 2021, but other companies may create open versions before it.) False
How much data will be used for its training and what type of data? (My guess is 400 GB of text plus illustrating pictures, but not audio and video.) True for text, false for pictures “The CommonCrawl data was downloaded from 41 shards of monthly CommonCrawl covering 2016 to 2019, constituting 45TB of compressed plaintext before filtering and 570GB after filtering, roughly equivalent to 400 billion byte-pair-encoded tokens”
What it will be able to do? (My guess: translation, picture generation based on text, text generation based on pictures – with 70 per cent of human performance.) False for pictures
How many parameters will be in the model? (My guess is 100 billion to trillion.) True “175 billion parameters”
How much compute will be used for training? (No idea.) “training the GPT-3 175B consumed several thousand petaflop/s-days of compute during pre-training, compared to tens of petaflop/s-days for a 1.5B parameter GPT-2 model”
- gwern 30 May 2020 2:26 UTC
  24 points
  0
  Parent
  With #3, I think you fell into the trap of being overly-specific and overly-committed to a specific organizational strategy. It would be very reasonable to assume that OA would be working on multimodal, because you need that for efficiency & generalization & ability to do things like text instructions to control a robot arm, and indeed, I quote TR about how they are working hard on large multimodal self-supervised Transformers… but you assumed that would have to be the “GPT-3”, instead of a parallel project while GPT-3 winds up being a scaled up GPT-2. It would have made more sense to split the predictions and try to be agnostic about whether OA would choose to do 2 big models or attempt 1 multimodal model, since it could be the case that the multimodal stuff would not mature in time (as seems to be the case), and predict instead more end outcomes like “human-level text article generation” or “models with >100b parameters”, since there are many possible routes to relatively few outcomes of interest.
- lifelonglearner 29 May 2020 22:35 UTC
  6 points
  0
  Parent
  Awesome, thanks for following up on this.