Great explanation! I was linked here by someone after wondering why linear regression was asymmetric. While a quick google and a chatGPT could tell me that they are minimizing different things, the advantage of your post is the:
Pictures
Explanation of why minimizing different things will get you slopes differing in this specific way (that is, far outliers are punished heavily)
A connection to PCA that is nice and simply explained.
I got curious and asked Claude to explain the difference between regressing X-onto-Y and Y-onto-X and it did a really good job—which I found somewhat distressing. Is my blog post even providing any value when an LLM can reproduce 80-90% of the insight in literally a 1000th of the time?
But maybe there’s still value in writing up the blog post because it’s non-trivial to know what the right questions are to ask. I wrote this blog post because I knew that (a) understanding the difference between the two regression lines was important and (b) it was actually straightforward to explain the difference if you used the right framing. So perhaps there’s still utility in having good taste in what questions are worth answering. At the very least, I personally benefited from writing up the post since it forced me to shore up my understanding.
Great explanation! I was linked here by someone after wondering why linear regression was asymmetric. While a quick google and a chatGPT could tell me that they are minimizing different things, the advantage of your post is the:
Pictures
Explanation of why minimizing different things will get you slopes differing in this specific way (that is, far outliers are punished heavily)
A connection to PCA that is nice and simply explained.
Thanks!
Thank you! That’s very kind.
I got curious and asked Claude to explain the difference between regressing X-onto-Y and Y-onto-X and it did a really good job—which I found somewhat distressing. Is my blog post even providing any value when an LLM can reproduce 80-90% of the insight in literally a 1000th of the time?
But maybe there’s still value in writing up the blog post because it’s non-trivial to know what the right questions are to ask. I wrote this blog post because I knew that (a) understanding the difference between the two regression lines was important and (b) it was actually straightforward to explain the difference if you used the right framing. So perhaps there’s still utility in having good taste in what questions are worth answering. At the very least, I personally benefited from writing up the post since it forced me to shore up my understanding.
Certainly, you have pictures! Pictures are great!