That’s a good point, says that the study collected data “in late 2021”. Instruction-following GPT-3 became OpenAI’s default model in January 2022, though the same article also mentions that the models “have been in beta on the API for more than a year”. I don’t know whether Replika had used those beta models or not.
That said, even though instruct-GPTs were technically trained with RLHF, the nature of that RLHF was quite different (they weren’t even chat models, so not trained for anything like continuing an ongoing conversation).
Fair,
Note they used GPT-3 which wasn’t trained with RLHF (right?)
That’s a good point, says that the study collected data “in late 2021”. Instruction-following GPT-3 became OpenAI’s default model in January 2022, though the same article also mentions that the models “have been in beta on the API for more than a year”. I don’t know whether Replika had used those beta models or not.
That said, even though instruct-GPTs were technically trained with RLHF, the nature of that RLHF was quite different (they weren’t even chat models, so not trained for anything like continuing an ongoing conversation).