Note: if you were training GPT-4.1 to output a binary classification result you would be confused by the openai accuracy plot!
The random baseline for binary classification is 0.83.
Suppose you trained the model to just output True / False.
Then you do a random baseline. You expect to see 50⁄50, because random right?
But instead you would see an accuracy of 0.83. This is because of the two extra tokens that the accuracy is calculated over.
Looking forward to your results involving a binary classification :D
Note: if you were training GPT-4.1 to output a binary classification result you would be confused by the openai accuracy plot!
The random baseline for binary classification is 0.83.
Suppose you trained the model to just output True / False.
Then you do a random baseline. You expect to see 50⁄50, because random right?
But instead you would see an accuracy of 0.83. This is because of the two extra tokens that the accuracy is calculated over.
Looking forward to your results involving a binary classification :D