Thanks for putting in the time to make sense of my cryptic and didactic ranting.
You don’t specify exactly how this second function can vary, whether it also has a few parameters or one parameter or many parameters?
Segmented linear regression usually does the trick. There’s only one input, and I’ve never seen discontinuities be necessary when applying this method, so only a few segments (<10) are needed.
I didn’t specify this because almost any regression algorithm would work and be interpretable, so readers can do whatever is most convenient to them.
Your approach of first optimizing f and then optimizing g, and then taking g ∘ f as your final model has the obvious alternative of directly optimizing g ∘ f with all parameters of each function optimized together.
What I actually do is optimize f until returns diminish, then optimize f and g together. I suggested “f then g” instead of “f then f&g” because it achieves most of the same benefit and I thought most readers would find it easier to apply.
(I don’t optimize f&g together from the outset because doing things that way ends up giving g an unindicatively large impact on predictions.)
is it really simpler than what you would otherwise have used for f?
Sometimes. Sometimes it isn’t. It depends how wrong the linkage is.
If there are multiple input variables I’m not sure I would conceptualize this as correcting the linkage, since it’s correcting the overall output and not specifically the relationship with any one input variable?
I would. When the linkage is wrong—like when you use an additive model on a multiplicative problem—models either systematically mis-estimate their extreme predictions or add unnecessary complexity in the form of interactions between features.
I often work in a regression modelling context where model interpretability is at a premium, and where the optimal linkage is almost but not quite multiplicative: that is, if you fit a simple multiplicative model, you’ll be mostly right but your higher predictions will be systematically too low.
The conventional way to correct for this is to add lots of complex interactions between features: “when X12 and X34 and X55 all take their Y-maximizing values, increase Y a bit more than you would otherwise have done”, repeated for various combinations of Xes. This ‘works’ but makes the model much less interpretable, and requires more data to do correctly.
Thanks for putting in the time to make sense of my cryptic and didactic ranting.
Segmented linear regression usually does the trick. There’s only one input, and I’ve never seen discontinuities be necessary when applying this method, so only a few segments (<10) are needed.
I didn’t specify this because almost any regression algorithm would work and be interpretable, so readers can do whatever is most convenient to them.
What I actually do is optimize f until returns diminish, then optimize f and g together. I suggested “f then g” instead of “f then f&g” because it achieves most of the same benefit and I thought most readers would find it easier to apply.
(I don’t optimize f&g together from the outset because doing things that way ends up giving g an unindicatively large impact on predictions.)
Sometimes. Sometimes it isn’t. It depends how wrong the linkage is.
I would. When the linkage is wrong—like when you use an additive model on a multiplicative problem—models either systematically mis-estimate their extreme predictions or add unnecessary complexity in the form of interactions between features.
I often work in a regression modelling context where model interpretability is at a premium, and where the optimal linkage is almost but not quite multiplicative: that is, if you fit a simple multiplicative model, you’ll be mostly right but your higher predictions will be systematically too low.
The conventional way to correct for this is to add lots of complex interactions between features: “when X12 and X34 and X55 all take their Y-maximizing values, increase Y a bit more than you would otherwise have done”, repeated for various combinations of Xes. This ‘works’ but makes the model much less interpretable, and requires more data to do correctly.