The issue with early finetuning is that there’s not much that humans can actually select on, because the models aren’t capable enough—it’s really hard for me to say that one string of gibberish is better/worse.
That’s why I say as early as possible, and not right from the very start.
The issue with early finetuning is that there’s not much that humans can actually select on, because the models aren’t capable enough—it’s really hard for me to say that one string of gibberish is better/worse.
That’s why I say as early as possible, and not right from the very start.