Thanks for the pointer, I think I get the idea. To check: It is a difference between whether many votes of 1 lead to more messages, or whether they only lead to more messages if at the same time there are many votes for 5. As in the dataset there were many woman who at the same time got many 1s and 5s, and many messages, the linear regression resulted in absurd values, which just happen to match the data-set, but do not model the (non-linear) reality, as for this one would have to consider another dimension, like “disagreement”, or whatever. And of course, all this would be much more clear to me if I’d sit down and just read a damn ultra-basic statistics book and learn that stuff. Gah.
That is a good example of an error that one could make from believing the data is linear (and thus trusting the regression coefficients) when it is not linear. If their non-linear model were correct, we would get regression coefficients like what we see. If we trusted the regression coefficients too much (implicitly assuming the data is linear), then the positive coefficient on the number of 1s would suggest that having all 1s is good. But it is not. Their model says it is not and the data says it is not (eg, the scatter plot).
I think that is what you are saying. It is certainly not their mistake—they believe their model. I am not saying anything so specific, but it is the type of mistake that I am talking about. Also, there are lots of non-linear models that lead to the same regression.
Thanks for the pointer, I think I get the idea. To check: It is a difference between whether many votes of 1 lead to more messages, or whether they only lead to more messages if at the same time there are many votes for 5. As in the dataset there were many woman who at the same time got many 1s and 5s, and many messages, the linear regression resulted in absurd values, which just happen to match the data-set, but do not model the (non-linear) reality, as for this one would have to consider another dimension, like “disagreement”, or whatever. And of course, all this would be much more clear to me if I’d sit down and just read a damn ultra-basic statistics book and learn that stuff. Gah.
That is a good example of an error that one could make from believing the data is linear (and thus trusting the regression coefficients) when it is not linear. If their non-linear model were correct, we would get regression coefficients like what we see. If we trusted the regression coefficients too much (implicitly assuming the data is linear), then the positive coefficient on the number of 1s would suggest that having all 1s is good. But it is not. Their model says it is not and the data says it is not (eg, the scatter plot).
I think that is what you are saying. It is certainly not their mistake—they believe their model. I am not saying anything so specific, but it is the type of mistake that I am talking about. Also, there are lots of non-linear models that lead to the same regression.