I think you’re right. “Sum of squared differences” makes sense as a normal thing to do with data points only if you’ve learned that it’s a measure of how spread apart they are, that it’s equivalent to the variance, and that making the variance small is a good way to ensure that a cluster is “well clumped.” There is a certain amount of intuition that’s built up from experience.
I also want to stress the point that I’m a bit biased(?) when it comes to understanding concepts. Surely I could accept any mathematical method or algorithm at face value. After all I’m also able to use WolframAlpha. But I feel that doesn’t count. At least I do not value such understanding. If you taught a prehistoric man to press some buttons he would be able to control a nuclear facility.
Many people are bothered by the counter-intuitive nature of probability. I have never been more confused by probability than by any other branch of mathematics. I believe that people regard probability as more difficult to understand because they learn about it much later than about other mathematical concepts. For me that is very different because it is all new to me. For me P(Y) ≥ P(X∧(X->Y)) is as (actually more) intuitive than a^2 + b^2 = c^2. The first makes sense in and of itself, the second needs context and proof (at least regarding my gut feeling). I just don’t see how 2 + 2 = 4 is more obvious than Bayes’ theorem. You just learnt to accept that 2 + 2 = 4 because 1.) you encounter the problem very often 2.) you can easily verify its solution 3.) you learn about it early on. But it is not self-evident.
I also want to stress the point that I’m a bit biased(?) when it comes to understanding concepts.
This is something people have noticed and it influences their responses. Aggressive “not understanding” is often considered a sign of bad faith, for good reason.
What I noticed is that everyone seems to assume that my problem to understand the sentence ”...within-cluster sum of squared differences...” was regarding “sum of squared differences” and not “within-cluster”. I don’t know the definition of the concept of a mathematical cluster. What might add to the confusion is that I’m not even sure about the meaning of the English word “cluster”. After that I decided to postpone reading the post. I could take the effort to look everything up of course but thought it would be more effective to read it in future.
Your post simply served as an example of how difficult it can be to read Less Wrong without a lot of background knowledge.
What I noticed is that everyone seems to assume that my problem to understand the sentence ”...within-cluster sum of squared differences...” was regarding “sum of squared differences” and not “within-cluster”.
Not really. I actually wrote a basic explanation of the whole sentence concept by concept but trimmed it down to the part that best illustrated dependence on mathematical background. Saying “within cluster is basically a phrase in English that refers to the same thing that’s in the title of the post” wouldn’t have helped convey the point. :P
It does, however, illustrate a different point. There is a trait related not just to intelligence but also to openness to information and flexible thinking that makes some people more suited than others to picking up and following new topics and ideas based on what they already know and filling in the blanks with their best inference. Confidence is part of it but part of it is social competition strategy embodied at the cognitive level.
There isn’t an explicit mathematical concept of a cluster.
Here’s what K-means does. Say, K is 3.
You try all the possible ways to partition your data points into three groups. You pick the partition that minimizes the sum of squared differences within each group. Then you iterate the procedure.
I think you’re right. “Sum of squared differences” makes sense as a normal thing to do with data points only if you’ve learned that it’s a measure of how spread apart they are, that it’s equivalent to the variance, and that making the variance small is a good way to ensure that a cluster is “well clumped.” There is a certain amount of intuition that’s built up from experience.
I also want to stress the point that I’m a bit biased(?) when it comes to understanding concepts. Surely I could accept any mathematical method or algorithm at face value. After all I’m also able to use WolframAlpha. But I feel that doesn’t count. At least I do not value such understanding. If you taught a prehistoric man to press some buttons he would be able to control a nuclear facility.
Many people are bothered by the counter-intuitive nature of probability. I have never been more confused by probability than by any other branch of mathematics. I believe that people regard probability as more difficult to understand because they learn about it much later than about other mathematical concepts. For me that is very different because it is all new to me. For me P(Y) ≥ P(X∧(X->Y)) is as (actually more) intuitive than a^2 + b^2 = c^2. The first makes sense in and of itself, the second needs context and proof (at least regarding my gut feeling). I just don’t see how 2 + 2 = 4 is more obvious than Bayes’ theorem. You just learnt to accept that 2 + 2 = 4 because 1.) you encounter the problem very often 2.) you can easily verify its solution 3.) you learn about it early on. But it is not self-evident.
This is something people have noticed and it influences their responses. Aggressive “not understanding” is often considered a sign of bad faith, for good reason.
What I noticed is that everyone seems to assume that my problem to understand the sentence ”...within-cluster sum of squared differences...” was regarding “sum of squared differences” and not “within-cluster”. I don’t know the definition of the concept of a mathematical cluster. What might add to the confusion is that I’m not even sure about the meaning of the English word “cluster”. After that I decided to postpone reading the post. I could take the effort to look everything up of course but thought it would be more effective to read it in future.
Your post simply served as an example of how difficult it can be to read Less Wrong without a lot of background knowledge.
Not really. I actually wrote a basic explanation of the whole sentence concept by concept but trimmed it down to the part that best illustrated dependence on mathematical background. Saying “within cluster is basically a phrase in English that refers to the same thing that’s in the title of the post” wouldn’t have helped convey the point. :P
It does, however, illustrate a different point. There is a trait related not just to intelligence but also to openness to information and flexible thinking that makes some people more suited than others to picking up and following new topics and ideas based on what they already know and filling in the blanks with their best inference. Confidence is part of it but part of it is social competition strategy embodied at the cognitive level.
There isn’t an explicit mathematical concept of a cluster.
Here’s what K-means does. Say, K is 3.
You try all the possible ways to partition your data points into three groups. You pick the partition that minimizes the sum of squared differences within each group.
Then you iterate the procedure.