Isn’t this always the case? You can always do linear regression, then use your extra parameter to get zero loss on the last entry for example.
This is why I used BIC: roughly speaking, if the increase in the parameter count is greater than the reduction of loss, BIC will increase. The more parameters (more precisely, degrees of freedom) the model has, the more BIC penalizes it.
Regarding RLVR, I don’t know when exactly it became widely used. 2024 vs 2025 makes a big difference here.
Seems a bit arbitrary. You’re looking for a way to bake in your priors on linear vs piece-wise linear with multiple segments. BIC doesn’t seem all that principled here.
Hmm, I can’t really think of a way you can do much better than staring at it to be honest.
But if you wanted to be rigorous about it, what would you do? I think you’d come up with initial subjective odds of linear vs 2-piecewise linear. E.g. 4:1. Then do a bayesian update using the marginal likelihood ratios.
This is why I used BIC: roughly speaking, if the increase in the parameter count is greater than the reduction of loss, BIC will increase. The more parameters (more precisely, degrees of freedom) the model has, the more BIC penalizes it.
Regarding RLVR, I don’t know when exactly it became widely used. 2024 vs 2025 makes a big difference here.
Seems a bit arbitrary. You’re looking for a way to bake in your priors on linear vs piece-wise linear with multiple segments. BIC doesn’t seem all that principled here.
Then what would you recommend for determining whether linear or piecewise linear (only 2 segments btw) provides a better fit?
Hmm, I can’t really think of a way you can do much better than staring at it to be honest.
But if you wanted to be rigorous about it, what would you do? I think you’d come up with initial subjective odds of linear vs 2-piecewise linear. E.g. 4:1. Then do a bayesian update using the marginal likelihood ratios.