Interestingly, fine-tuning does better than the other methods on Multi-answer, but not that well on multiply-divide. I would have forecast the opposite considering the training task. For example, the model could have just guessed the probability by looking at the number of digits involved in the operation.
Interestingly, fine-tuning does better than the other methods on Multi-answer, but not that well on multiply-divide. I would have forecast the opposite considering the training task. For example, the model could have just guessed the probability by looking at the number of digits involved in the operation.
Hum, I do not understand why.