Charlie Steiner comments on Did ChatGPT just gaslight me?

Charlie Steiner 1 Dec 2022 7:53 UTC
5 points
0
What’s the training of ChatGPT like? Is it realistic that it’s learned to double down on mistakes as a way to get RL reward, or is it still anchored by unsupervised learning, and therefore in some sense thought your conversation was a likely continuation?
- TW123 1 Dec 2022 13:18 UTC
  5 points
  0
  Parent
  OpenAI has in the past not been that transparent about these questions, but in this case, the blog post (linked in my post) makes it very clear it’s trained with reinforcement learning from human feedback.
  
  However, of course it was initially pretrained in an unsupervised fashion (it’s based on GPT-3), so it seems hard to know whether this specific behavior was “due to the RL” or “a likely continuation”.