If I’d tried to assume that models are “trying” to get a low value for the training loss, I might have ended up relying on our ability to incentivize the model to make very long-term predictions. But I think that approach is basically a dead end.
Why do you believe that this is a dead-end?
Why do you believe that this is a dead-end?