James Chua comments on OpenAI finetuning metrics: What is going on with the loss curves?

James Chua 24 Nov 2025 19:25 UTC
1 point
0
Note: if you were training GPT-4.1 to output a binary classification result you would be confused by the openai accuracy plot!
The random baseline for binary classification is 0.83.
Suppose you trained the model to just output True / False.
Then you do a random baseline. You expect to see ⁵⁰⁄₅₀, because random right?
But instead you would see an accuracy of 0.83. This is because of the two extra tokens that the accuracy is calculated over.
- Charlie Steiner 25 Nov 2025 0:46 UTC
  2 points
  0
  Parent
  Looking forward to your results involving a binary classification :D