That’s right. We initially thought it might be important so that the LLM “understood” the task better, but it didn’t matter much in the end. The main hyperparameters for our experiments are in train_ray.py, where you can see that we use a “token_loss_weight” of 0.
That’s right. We initially thought it might be important so that the LLM “understood” the task better, but it didn’t matter much in the end. The main hyperparameters for our experiments are in train_ray.py, where you can see that we use a “token_loss_weight” of 0.
(Feel free to ask more questions!)