I’m not aware of the exact way these different GPT-Instruct models were fine-tuned, but it sure looks like we can see the series progressively getting better at zero-shot encoding morse with each new version (presumably as each version is fine-tuned further and further).
Any chance you could try this with the un-finetuned base models? Specifically davinci in the dropdown ‘more models’ in the Playground, which I believe is also the closest to the July 2020 model. If it’s the RL finetuning, which seems plausible given the differences between the versions of InstructGPT you report, the baseline ought to be the most diverse/random.
Any chance you could try this with the un-finetuned base models? Specifically
davinci
in the dropdown ‘more models’ in the Playground, which I believe is also the closest to the July 2020 model. If it’s the RL finetuning, which seems plausible given the differences between the versions of InstructGPT you report, the baseline ought to be the most diverse/random.