I had missed this step. Retrospectively it should have been obvious… of course that you don’t start from a huge text predictor model to build a code predictor model that only needs to predict compilable code. Thanks for the clarification.
I think the fact that GPT-3 is controlled by OpenAI and AlphaCode is a DeepMind project has more to do with it. Of course you don’t need to hotstart by transfer learning, but it’s a good idea anyway if you can, which is why DM not using its own GPT-3-equivalent (Gopher, trained at considerable expense) has drawn comment.
I had missed this step. Retrospectively it should have been obvious… of course that you don’t start from a huge text predictor model to build a code predictor model that only needs to predict compilable code. Thanks for the clarification.
I think the fact that GPT-3 is controlled by OpenAI and AlphaCode is a DeepMind project has more to do with it. Of course you don’t need to hotstart by transfer learning, but it’s a good idea anyway if you can, which is why DM not using its own GPT-3-equivalent (Gopher, trained at considerable expense) has drawn comment.