I know, but I gave them to a text predictor not specifically tailored to write code and it wrote correct code anyway. For the first four prompts we might argue that it probably just copied code from the training data, but this seems quite unlikely for the last two. My rough non-expert intuition is that the shallow understanding of “write code” didn’t really change that much from GPT-3 to AlphaCode, and the performance boost of the latter is essentially due to fine-tuning and filtering tricks.
Some things that I feel undermine your case: your sample size is fairly small here, and it would have been valuable if you tried sampling maybe 10-20 times for each. Also, these code snippets are either the kind of thing I’d expect would be in the dataset, or are trivial. Plus, GPT-3 wasn’t used as a base model for AlphaCode, so it can’t have been due to “fine-tuning and filtering tricks”. Finally, GPT-3 is way bigger than any AlphaCode model.
I had missed this step. Retrospectively it should have been obvious… of course that you don’t start from a huge text predictor model to build a code predictor model that only needs to predict compilable code. Thanks for the clarification.
I think the fact that GPT-3 is controlled by OpenAI and AlphaCode is a DeepMind project has more to do with it. Of course you don’t need to hotstart by transfer learning, but it’s a good idea anyway if you can, which is why DM not using its own GPT-3-equivalent (Gopher, trained at considerable expense) has drawn comment.
Note that competitive programming tasks tend to be much harder than the prompts you gave.
I know, but I gave them to a text predictor not specifically tailored to write code and it wrote correct code anyway. For the first four prompts we might argue that it probably just copied code from the training data, but this seems quite unlikely for the last two. My rough non-expert intuition is that the shallow understanding of “write code” didn’t really change that much from GPT-3 to AlphaCode, and the performance boost of the latter is essentially due to fine-tuning and filtering tricks.
Some things that I feel undermine your case: your sample size is fairly small here, and it would have been valuable if you tried sampling maybe 10-20 times for each. Also, these code snippets are either the kind of thing I’d expect would be in the dataset, or are trivial. Plus, GPT-3 wasn’t used as a base model for AlphaCode, so it can’t have been due to “fine-tuning and filtering tricks”. Finally, GPT-3 is way bigger than any AlphaCode model.
I had missed this step. Retrospectively it should have been obvious… of course that you don’t start from a huge text predictor model to build a code predictor model that only needs to predict compilable code. Thanks for the clarification.
I think the fact that GPT-3 is controlled by OpenAI and AlphaCode is a DeepMind project has more to do with it. Of course you don’t need to hotstart by transfer learning, but it’s a good idea anyway if you can, which is why DM not using its own GPT-3-equivalent (Gopher, trained at considerable expense) has drawn comment.