I do feel like this conversation (and all other comment threads) are getting blocked a bit. It may help to distinguish the communicative risks of sharing a plot with an exponential fit and the actual predictive usefulness of locally fitting an exponential. I agree that the former can be a problem, but I think the latter is not. Do we disagree?
I disagree that you don’t have to count the error from predictions you don’t make. For example, maybe I fail to predict that there will be a market crash next month because I didn’t bother to try to make a prediction at all, so I leave all my investments maximally exposed to the market. When my investments are suddenly worth a lot less because I didn’t anticipate the market crash, that’s an error as far as my net worth is concerned, whether I explicitly made the prediction or not. Maybe I shouldn’t lose Bayes points or something, but Bayes points don’t pay the rent!
I do feel like this conversation (and all other comment threads) are getting blocked a bit. It may help to distinguish the communicative risks of sharing a plot with an exponential fit and the actual predictive usefulness of locally fitting an exponential. I agree that the former can be a problem, but I think the latter is not. Do we disagree?
I continue to think that the idea of locally fitting an exponential doesn’t make sense in most cases where we care about predictive usefulness. If all we care about is fitting a curve to past data, sure, we can always do that. In fact, we can do better than an exponential; we can always find a polynomial that will fit every data point!
The reason to model something using an exponential curve is to make predictions about future data points, in which case the idea of “local” fitting stops making sense, because now we care a great deal about what happens next, and maybe we’re almost at the top of a sigmoid, about to level out. I have no problem saying that some particular thing we are modeling may be far from topping out, but if that’s the claim we want to make, we should offer a theory for why we believe it’s far from topping out, not just throw up our hands and say “idk man, i can’t figure it out, guess i’ll just use an exponential”, which is what many exponential curve models I see effectively do.
One can model a local part of the future, not just the past.
Here’s a related example. If I know the spring constant of some material, I can model the force as linear in displacement. I know that it is not truly linear in displacement, but that is still a useful simplified model that allows an analytical solution and works well in some reasonable local neighborhood—and I might not know exactly where it breaks down! I can still use the approximation if I don’t know exactly where it breaks down. And I don’t actually know the math that describes precisely where it breaks down, so trying to add it to the model would probably just result in saying something wrong—it might even mess up my LOCAL predictions.
I don’t think that you would burst into an undergraduate physics class and say “why are you using this first order differential equation?? In reality, pulling a spring far past the breaking point will cause the spring ‘constant’ to decrease! It’s not really a constant! It is NEVER really a constant! This model is making you more wrong!”
Rather, the professor should say the following sentence: “the linear model is an approximation that works for small displacements.”
When it comes to springs, someone (not me) does know math that reasonably well models where the linear approximation breaks down for a given material. For scaling laws, we don’t even know that math.
Ironically, I have a 250 dollar bet on with Kokotajlo that task lengths will scale (quite) sub exponentially because I believe that the sigmoid runs out soon, for a long list of concrete reasons that apply to this specific case. But NOT because I’m actually fitting a sigmoid.
Yes, this all seems quite reasonable, and I think it’s a failure if we fail to at least acknowledge that the model is going to break down at some point and give some guesses about when it will break down, which is what I see happening a lot when I read about exponential growth models (the modeler presents a curve, but not a model or even of theory of how growth might end, which to me feels like I’m only getting half a model, and it makes the model not very useful because it has such limited predictive power and it’s not even attempting to quantifying the limitations).
Like I’m okay with saying we don’t know how to quantify something at all, but once we start quantifying, I expect to see the quantifying carried through. Making a bet is a great way to quantify!
You don’t have to count error from predictions you don’t make. Just put an upper limit on the x axis.
I do feel like this conversation (and all other comment threads) are getting blocked a bit. It may help to distinguish the communicative risks of sharing a plot with an exponential fit and the actual predictive usefulness of locally fitting an exponential. I agree that the former can be a problem, but I think the latter is not. Do we disagree?
I disagree that you don’t have to count the error from predictions you don’t make. For example, maybe I fail to predict that there will be a market crash next month because I didn’t bother to try to make a prediction at all, so I leave all my investments maximally exposed to the market. When my investments are suddenly worth a lot less because I didn’t anticipate the market crash, that’s an error as far as my net worth is concerned, whether I explicitly made the prediction or not. Maybe I shouldn’t lose Bayes points or something, but Bayes points don’t pay the rent!
I continue to think that the idea of locally fitting an exponential doesn’t make sense in most cases where we care about predictive usefulness. If all we care about is fitting a curve to past data, sure, we can always do that. In fact, we can do better than an exponential; we can always find a polynomial that will fit every data point!
The reason to model something using an exponential curve is to make predictions about future data points, in which case the idea of “local” fitting stops making sense, because now we care a great deal about what happens next, and maybe we’re almost at the top of a sigmoid, about to level out. I have no problem saying that some particular thing we are modeling may be far from topping out, but if that’s the claim we want to make, we should offer a theory for why we believe it’s far from topping out, not just throw up our hands and say “idk man, i can’t figure it out, guess i’ll just use an exponential”, which is what many exponential curve models I see effectively do.
One can model a local part of the future, not just the past.
Here’s a related example. If I know the spring constant of some material, I can model the force as linear in displacement. I know that it is not truly linear in displacement, but that is still a useful simplified model that allows an analytical solution and works well in some reasonable local neighborhood—and I might not know exactly where it breaks down! I can still use the approximation if I don’t know exactly where it breaks down. And I don’t actually know the math that describes precisely where it breaks down, so trying to add it to the model would probably just result in saying something wrong—it might even mess up my LOCAL predictions.
I don’t think that you would burst into an undergraduate physics class and say “why are you using this first order differential equation?? In reality, pulling a spring far past the breaking point will cause the spring ‘constant’ to decrease! It’s not really a constant! It is NEVER really a constant! This model is making you more wrong!”
Rather, the professor should say the following sentence: “the linear model is an approximation that works for small displacements.”
When it comes to springs, someone (not me) does know math that reasonably well models where the linear approximation breaks down for a given material. For scaling laws, we don’t even know that math.
Ironically, I have a 250 dollar bet on with Kokotajlo that task lengths will scale (quite) sub exponentially because I believe that the sigmoid runs out soon, for a long list of concrete reasons that apply to this specific case. But NOT because I’m actually fitting a sigmoid.
Yes, this all seems quite reasonable, and I think it’s a failure if we fail to at least acknowledge that the model is going to break down at some point and give some guesses about when it will break down, which is what I see happening a lot when I read about exponential growth models (the modeler presents a curve, but not a model or even of theory of how growth might end, which to me feels like I’m only getting half a model, and it makes the model not very useful because it has such limited predictive power and it’s not even attempting to quantifying the limitations).
Like I’m okay with saying we don’t know how to quantify something at all, but once we start quantifying, I expect to see the quantifying carried through. Making a bet is a great way to quantify!