Your recent post is a good example. A friendly math post! I gave up after reading ”...within-cluster sum of squared differences...” :-)
Your points are really surprising. I do not interact with educated people in meatspace at all. I didn’t think that someone who reached your level of education needed LW to break with religion. And I’ve always been the kind of person to take futuristic topics seriously by default, so that was no surprise at all to me. I guess that is why people here are irritated when I argue that science fiction authors talk about many topics discussed here for a long time. I don’t see how the fictional exploration of concepts could lower the credence of the subjects.
Nevertheless, I never tried to argue that Less Wrong is useless. It’s one of my favorite places in the metaverse.
I didn’t think that someone who reached your level of education needed LW to break with religion.
People certainly ought not need to. By that I mean that people with the general cognitive capacity of humans have more than enough ability to evaluate religion as nonsense given even basic modern education. But even so it is damn hard to break free. Part of what makes the ‘Belief in Belief’ post particularly useful is that it is written in an attempt to understand what is really going on when people ‘believe’ things that don’t make sense given what they know.
The social factors are also important. Religion is essentially about signalling tribal affiliation. It is dangerous to discard a tribal identity without already having found yourself a new tribe—even a compartmentalised online tribe used primarily for cognitive affiliation.
Nevertheless, I never tried to argue that Less Wrong is useless. It’s one of my favorite places in the metaverse.
This is something I have to remind myself of when reading your comments. You are sincere. Not having known your online self at all some of your arguments and questions would seem far more rhetorical than you intended them. You actually do update on new information which is a big deal!
I didn’t realize I was being unclear in that last post! Clearly it’s one of those things that takes practice. (In my defense I really don’t know where the median LW reader is at math; the level of that post was a wild guess.)
Glad you’re not opposed to LessWrong as a place. I’m not certain myself whether it really fulfills its stated goal of helping people come to conclusions more rationally. (When decisions are actually hard, when empirical evidence is sparse and trial-and-error is impossible, I’m not sure it’s possible to decide rationally at all! )
I think one thing it does is promote a norm of measured thinking, where we keep our emotions at a conversational level instead of letting them shout. I’ve definitely noticed that attitude spilling out into my everyday life, and I find myself checking “do I think that’s really plausible or am I just saying it?”
I didn’t realize I was being unclear in that last post!
No, that isn’t it. It’s just that the math was above my current level of education. It was all Chinese to me! That doesn’t mean that I am against advanced math posts. I believe more technical posts would improve Less Wrong a lot. I loved the recent posts by cousin_it. Even though the key issues have been above my head they introduced me to so many new ideas. They gave me this feeling of discovering and learning something new and important. And the discussions they spawned have been of higher standard because nobody of lower education dared to say much. They also spawned awesome comments like this one. Your post is no different, just that I deferred reading it until I learnt the necessary math. Such posts actually give me incentive to learn more.
How to improve Less Wrong:
Write more technical posts (including math).
Either: Define the demographics. Explicitly mention the level of education necessary for all of Less Wrong.
Or: Introduce labels rating the level of difficulty for each post.
Provide more background knowledge in each post you write through references and links.
Someone like me has to look up each of the symbols. It would have been much easier this way: If P(Y|X) ≈ 1, then P(X∧Y) ≈ P(X).
“If” should not go to Conditional_(Programming), but “Logical Implication”, though I don’t see the need for a link. It really is just the standard meaning of “if”, and if people don’t know the meaning of “if”, advanced rationality is probably a bit beyond what they can immediately use.
“1” as a link to percentage is odd as well. It’s just the number one. Yes, people are often more used to it as 100% in the context of probability, but the link doesn’t clarify that in any useful way.
The links for conditional probability and conjunction are great though. It’s quite possible to not be familiar with those particular bits of notation.
I know, but you see what’s the issue here. It actually has been a problem all my life. I’m really happy that there are now places like the Khan Academy and BetterExplained that actually explain such matters in a concise and straightforward way, not like school teachers who you never understand. Most of the time I only have to watch/read their explanation once to grasp it. Further they go into details you are never told about in school.
I guess I’m the kind of person who is unable to accept that 1+1=2 until someone explains the terms and operators. I only started with mathematics last year with a previous knowledge of basic arithmetic. Yet the first things I tried to figure out is what ‘+’ actually means. That showed me that infix operators are functions and led me to the recursive and set theoretic definition of addition. Only at that point I have been satisfied. Which reminds me of a problem I had in German lessons back in elementary school. I always insisted to pronounce certain words the way I thought it was the most logical consistent to do, e.g. to pronounce ‘st’ not as ‘sch’. Nobody ever told me that natural language evolved and that it is just an axiomatic definition, a cultural consensus to pronounce it in a certain way, not something you can infer from the general to the specific. So I kept pronouncing it the way I thought it was reasonable and ended up with bad grades. Such problems accumulated and I just stopped doing anything for school (also because I thought the other kids are all aliens). I’m only beginning to catch up for a few years now. English was the first thing I taught myself.
The links for conditional probability and conjunction are great though. It’s quite possible to not be familiar with those particular bits of notation.
Hah! You must be one of those people who are only surrounded by educated folks. I don’t know anyone in real-life who has any clue what a logical conjunction could be (been working as baker and doing roadworks). Something nasty maybe :-)
A friendly math post! I gave up after reading ”...within-cluster sum of squared differences...” :-)
It is easy for a math literate person to over-estimate how obvious certain jargon is to people. Like ‘sum of squared differences’ for example. Squared differences is just what is involved when you are calculating things like standard deviation. It’s what you use when looking at, say, a group of people and deciding whether they all have about the same height or if some are really tall but others are really short. How different they are.
For those who have never had to manually calculate the standard deviation and similar statistics the term would just be meaningless. (Which makes your example a good demonstration of your point!)
Squared differences is just what is involved when you are calculating things like standard deviation
Never mind that; just parse the damn phrase! All you need to know is what a “difference” is, and what “to square” means.
Why, I wonder, do people assume that words lose their individual meanings when combined, so that something like “squared differences” registers as “[unknown vocabulary item]” rather than “differences that have been squared”?
Why, I wonder, do people assume that words lose their individual meanings when combined, so that something like “squared differences” registers as “[unknown vocabulary item]” rather than “differences that have been squared”?
Because quite often sophisticated people will punish you socially if you don’t take special care to pay homage to whatever extra meaning the combined phrase has taken on. Caution in such cases is a practical social move.
It’s also very helpful to know things like why someone might go around squaring differences and then summing them, and what kinds of situations that makes sense in. That way you can tell when you make errors of interpretation. For example, “differences pertaining to the squared” is a plausible but less likely interpretation of “squared differences”, but knowing that people commonly square differences and then sum them in order to calculate an L₂ norm, often because they are going to take the derivative of the result so as to solve for a local minimum, makes that a much less plausible interpretation.
And for a Bayesian to be rational in the colloquial sense, they must always remember to assign some substantial probability weight to “other”. For example, you can’t simply assume that words like “sum” and “differences” are being used with one of the meanings you’re familiar with; you must remember that there’s always the possibility that you’re encountering a new sense of the word.
For those who have never had to manually calculate the standard deviation and similar statistics the term would just be meaningless. (Which makes your example a good demonstration of your point!)
Really? I think I would have understood that sentence before the first time I tried to calculate a standard deviation manually. In general, there are many ways to arrive at an understanding of a concept. I’m very skeptical of statements of the form “you can’t understand X without doing Y first.”
What do you mean? Are you saying that everyone with an average IQ is supposed to be able to understand what it means to minimize the within-cluster sum of squared differences, regardless of education? I don’t know what a standard deviation is either. I am able to read Wikipedia, understand what to do and use it. I know what squared means and I know what differences means. I just expected the sentence to mean more than the sum of its parts. Also I do not call the ability to use tools comprehension. What I value is to know when to use a particular tool, how to use it effectively and how it works.
You could teach stone-age people to drive a car. It would still seem like magic to them. Yet if you cloned them and exposed them to the right circumstance they might actually understand the internal combustion engine once grown up. Same IQ. Same as the server WolframAlpha is running on do possess a certain potential. Yet what enables the potential are the five million lines of Mathematica.
I’d be really surprised if one was able to understand the sentence the first time with a self-taught 1-year educational background in mathematics. That doesn’t mean that there are exceptions, I’m not a prodigy.
I think you’re right. “Sum of squared differences” makes sense as a normal thing to do with data points only if you’ve learned that it’s a measure of how spread apart they are, that it’s equivalent to the variance, and that making the variance small is a good way to ensure that a cluster is “well clumped.” There is a certain amount of intuition that’s built up from experience.
I also want to stress the point that I’m a bit biased(?) when it comes to understanding concepts. Surely I could accept any mathematical method or algorithm at face value. After all I’m also able to use WolframAlpha. But I feel that doesn’t count. At least I do not value such understanding. If you taught a prehistoric man to press some buttons he would be able to control a nuclear facility.
Many people are bothered by the counter-intuitive nature of probability. I have never been more confused by probability than by any other branch of mathematics. I believe that people regard probability as more difficult to understand because they learn about it much later than about other mathematical concepts. For me that is very different because it is all new to me. For me P(Y) ≥ P(X∧(X->Y)) is as (actually more) intuitive than a^2 + b^2 = c^2. The first makes sense in and of itself, the second needs context and proof (at least regarding my gut feeling). I just don’t see how 2 + 2 = 4 is more obvious than Bayes’ theorem. You just learnt to accept that 2 + 2 = 4 because 1.) you encounter the problem very often 2.) you can easily verify its solution 3.) you learn about it early on. But it is not self-evident.
I also want to stress the point that I’m a bit biased(?) when it comes to understanding concepts.
This is something people have noticed and it influences their responses. Aggressive “not understanding” is often considered a sign of bad faith, for good reason.
What I noticed is that everyone seems to assume that my problem to understand the sentence ”...within-cluster sum of squared differences...” was regarding “sum of squared differences” and not “within-cluster”. I don’t know the definition of the concept of a mathematical cluster. What might add to the confusion is that I’m not even sure about the meaning of the English word “cluster”. After that I decided to postpone reading the post. I could take the effort to look everything up of course but thought it would be more effective to read it in future.
Your post simply served as an example of how difficult it can be to read Less Wrong without a lot of background knowledge.
What I noticed is that everyone seems to assume that my problem to understand the sentence ”...within-cluster sum of squared differences...” was regarding “sum of squared differences” and not “within-cluster”.
Not really. I actually wrote a basic explanation of the whole sentence concept by concept but trimmed it down to the part that best illustrated dependence on mathematical background. Saying “within cluster is basically a phrase in English that refers to the same thing that’s in the title of the post” wouldn’t have helped convey the point. :P
It does, however, illustrate a different point. There is a trait related not just to intelligence but also to openness to information and flexible thinking that makes some people more suited than others to picking up and following new topics and ideas based on what they already know and filling in the blanks with their best inference. Confidence is part of it but part of it is social competition strategy embodied at the cognitive level.
There isn’t an explicit mathematical concept of a cluster.
Here’s what K-means does. Say, K is 3.
You try all the possible ways to partition your data points into three groups. You pick the partition that minimizes the sum of squared differences within each group. Then you iterate the procedure.
What do you mean? Are you saying that everyone with an average IQ is supposed to be able to understand what it means to minimize the within-cluster sum of squared differences, regardless of education?
No, approximately the opposite of that. Are you sure you didn’t intend this to be a reply to Peter? It seems to be quite an odd reply to me in the context.
You said that you have been polite in what you previously wrote. I parsed that the way that you agree with Peter de Blanc but that you have chosen to communicate this fact in a way that makes it possible to arrive at the conclusion without stating it. In other words, I should have been able to understand the sentence.
I didn’t reply to Peter de Blanc because I don’t know him and he doesn’t know me and so his statement that he would have understood Y without X doesn’t give me much information regarding my own intelligence. But you have actually read a lot of my comments and addressed me directly in the discussion above.
Interestingly I’m having a discussion (see my previous comments) with Roko if one should tell people directly if they are dumb or try to communicate such a truth differently.
Note polite enough to lie but polite enough to leave off all the caveats and exceptions. Some here could, understand the sentence even with no education in mathematics. Even so, the essentials of what I said was sincere. Piecing together that kind of jargon from the scraps of information available in the context is a far harder task than just understanding the article itself.
Your recent post is a good example. A friendly math post! I gave up after reading ”...within-cluster sum of squared differences...” :-)
Your points are really surprising. I do not interact with educated people in meatspace at all. I didn’t think that someone who reached your level of education needed LW to break with religion. And I’ve always been the kind of person to take futuristic topics seriously by default, so that was no surprise at all to me. I guess that is why people here are irritated when I argue that science fiction authors talk about many topics discussed here for a long time. I don’t see how the fictional exploration of concepts could lower the credence of the subjects.
Nevertheless, I never tried to argue that Less Wrong is useless. It’s one of my favorite places in the metaverse.
People certainly ought not need to. By that I mean that people with the general cognitive capacity of humans have more than enough ability to evaluate religion as nonsense given even basic modern education. But even so it is damn hard to break free. Part of what makes the ‘Belief in Belief’ post particularly useful is that it is written in an attempt to understand what is really going on when people ‘believe’ things that don’t make sense given what they know.
The social factors are also important. Religion is essentially about signalling tribal affiliation. It is dangerous to discard a tribal identity without already having found yourself a new tribe—even a compartmentalised online tribe used primarily for cognitive affiliation.
This is something I have to remind myself of when reading your comments. You are sincere. Not having known your online self at all some of your arguments and questions would seem far more rhetorical than you intended them. You actually do update on new information which is a big deal!
I didn’t realize I was being unclear in that last post! Clearly it’s one of those things that takes practice. (In my defense I really don’t know where the median LW reader is at math; the level of that post was a wild guess.)
Glad you’re not opposed to LessWrong as a place. I’m not certain myself whether it really fulfills its stated goal of helping people come to conclusions more rationally. (When decisions are actually hard, when empirical evidence is sparse and trial-and-error is impossible, I’m not sure it’s possible to decide rationally at all! )
I think one thing it does is promote a norm of measured thinking, where we keep our emotions at a conversational level instead of letting them shout. I’ve definitely noticed that attitude spilling out into my everyday life, and I find myself checking “do I think that’s really plausible or am I just saying it?”
No, that isn’t it. It’s just that the math was above my current level of education. It was all Chinese to me! That doesn’t mean that I am against advanced math posts. I believe more technical posts would improve Less Wrong a lot. I loved the recent posts by cousin_it. Even though the key issues have been above my head they introduced me to so many new ideas. They gave me this feeling of discovering and learning something new and important. And the discussions they spawned have been of higher standard because nobody of lower education dared to say much. They also spawned awesome comments like this one. Your post is no different, just that I deferred reading it until I learnt the necessary math. Such posts actually give me incentive to learn more.
How to improve Less Wrong:
Write more technical posts (including math).
Either: Define the demographics. Explicitly mention the level of education necessary for all of Less Wrong.
Or: Introduce labels rating the level of difficulty for each post.
Provide more background knowledge in each post you write through references and links.
Example:
Someone like me has to look up each of the symbols. It would have been much easier this way: If P(Y|X) ≈ 1, then P(X∧Y) ≈ P(X).
Advance the FAQ and link to it on the frontpage (When should I write a top-level article?; You must read the sequences before commenting etc.).
Be more kind to people who don’t know better. Try to link them up and don’t explain what’s wrong but why and how they are wrong.
Yeah, I’m trying hard not to write without thinking. Sometimes I still fail, especially when I’m tired.
“If” should not go to Conditional_(Programming), but “Logical Implication”, though I don’t see the need for a link. It really is just the standard meaning of “if”, and if people don’t know the meaning of “if”, advanced rationality is probably a bit beyond what they can immediately use.
“1” as a link to percentage is odd as well. It’s just the number one. Yes, people are often more used to it as 100% in the context of probability, but the link doesn’t clarify that in any useful way.
The links for conditional probability and conjunction are great though. It’s quite possible to not be familiar with those particular bits of notation.
I know, but you see what’s the issue here. It actually has been a problem all my life. I’m really happy that there are now places like the Khan Academy and BetterExplained that actually explain such matters in a concise and straightforward way, not like school teachers who you never understand. Most of the time I only have to watch/read their explanation once to grasp it. Further they go into details you are never told about in school.
I guess I’m the kind of person who is unable to accept that 1+1=2 until someone explains the terms and operators. I only started with mathematics last year with a previous knowledge of basic arithmetic. Yet the first things I tried to figure out is what ‘+’ actually means. That showed me that infix operators are functions and led me to the recursive and set theoretic definition of addition. Only at that point I have been satisfied. Which reminds me of a problem I had in German lessons back in elementary school. I always insisted to pronounce certain words the way I thought it was the most logical consistent to do, e.g. to pronounce ‘st’ not as ‘sch’. Nobody ever told me that natural language evolved and that it is just an axiomatic definition, a cultural consensus to pronounce it in a certain way, not something you can infer from the general to the specific. So I kept pronouncing it the way I thought it was reasonable and ended up with bad grades. Such problems accumulated and I just stopped doing anything for school (also because I thought the other kids are all aliens). I’m only beginning to catch up for a few years now. English was the first thing I taught myself.
Hah! You must be one of those people who are only surrounded by educated folks. I don’t know anyone in real-life who has any clue what a logical conjunction could be (been working as baker and doing roadworks). Something nasty maybe :-)
I find myself checking “I think that’s really plausible. That can’t be good. I wonder what I should be saying instead to be socially successful.” ;)
It is easy for a math literate person to over-estimate how obvious certain jargon is to people. Like ‘sum of squared differences’ for example. Squared differences is just what is involved when you are calculating things like standard deviation. It’s what you use when looking at, say, a group of people and deciding whether they all have about the same height or if some are really tall but others are really short. How different they are.
For those who have never had to manually calculate the standard deviation and similar statistics the term would just be meaningless. (Which makes your example a good demonstration of your point!)
Never mind that; just parse the damn phrase! All you need to know is what a “difference” is, and what “to square” means.
Why, I wonder, do people assume that words lose their individual meanings when combined, so that something like “squared differences” registers as “[unknown vocabulary item]” rather than “differences that have been squared”?
Because quite often sophisticated people will punish you socially if you don’t take special care to pay homage to whatever extra meaning the combined phrase has taken on. Caution in such cases is a practical social move.
Good observation; I had been subliminally aware of it but nobody had ever pointed it out to me explicitly.
It’s also very helpful to know things like why someone might go around squaring differences and then summing them, and what kinds of situations that makes sense in. That way you can tell when you make errors of interpretation. For example, “differences pertaining to the squared” is a plausible but less likely interpretation of “squared differences”, but knowing that people commonly square differences and then sum them in order to calculate an L₂ norm, often because they are going to take the derivative of the result so as to solve for a local minimum, makes that a much less plausible interpretation.
And for a Bayesian to be rational in the colloquial sense, they must always remember to assign some substantial probability weight to “other”. For example, you can’t simply assume that words like “sum” and “differences” are being used with one of the meanings you’re familiar with; you must remember that there’s always the possibility that you’re encountering a new sense of the word.
Really? I think I would have understood that sentence before the first time I tried to calculate a standard deviation manually. In general, there are many ways to arrive at an understanding of a concept. I’m very skeptical of statements of the form “you can’t understand X without doing Y first.”
I was being polite.
What do you mean? Are you saying that everyone with an average IQ is supposed to be able to understand what it means to minimize the within-cluster sum of squared differences, regardless of education? I don’t know what a standard deviation is either. I am able to read Wikipedia, understand what to do and use it. I know what squared means and I know what differences means. I just expected the sentence to mean more than the sum of its parts. Also I do not call the ability to use tools comprehension. What I value is to know when to use a particular tool, how to use it effectively and how it works.
You could teach stone-age people to drive a car. It would still seem like magic to them. Yet if you cloned them and exposed them to the right circumstance they might actually understand the internal combustion engine once grown up. Same IQ. Same as the server WolframAlpha is running on do possess a certain potential. Yet what enables the potential are the five million lines of Mathematica.
I’d be really surprised if one was able to understand the sentence the first time with a self-taught 1-year educational background in mathematics. That doesn’t mean that there are exceptions, I’m not a prodigy.
I think you’re right. “Sum of squared differences” makes sense as a normal thing to do with data points only if you’ve learned that it’s a measure of how spread apart they are, that it’s equivalent to the variance, and that making the variance small is a good way to ensure that a cluster is “well clumped.” There is a certain amount of intuition that’s built up from experience.
I also want to stress the point that I’m a bit biased(?) when it comes to understanding concepts. Surely I could accept any mathematical method or algorithm at face value. After all I’m also able to use WolframAlpha. But I feel that doesn’t count. At least I do not value such understanding. If you taught a prehistoric man to press some buttons he would be able to control a nuclear facility.
Many people are bothered by the counter-intuitive nature of probability. I have never been more confused by probability than by any other branch of mathematics. I believe that people regard probability as more difficult to understand because they learn about it much later than about other mathematical concepts. For me that is very different because it is all new to me. For me P(Y) ≥ P(X∧(X->Y)) is as (actually more) intuitive than a^2 + b^2 = c^2. The first makes sense in and of itself, the second needs context and proof (at least regarding my gut feeling). I just don’t see how 2 + 2 = 4 is more obvious than Bayes’ theorem. You just learnt to accept that 2 + 2 = 4 because 1.) you encounter the problem very often 2.) you can easily verify its solution 3.) you learn about it early on. But it is not self-evident.
This is something people have noticed and it influences their responses. Aggressive “not understanding” is often considered a sign of bad faith, for good reason.
What I noticed is that everyone seems to assume that my problem to understand the sentence ”...within-cluster sum of squared differences...” was regarding “sum of squared differences” and not “within-cluster”. I don’t know the definition of the concept of a mathematical cluster. What might add to the confusion is that I’m not even sure about the meaning of the English word “cluster”. After that I decided to postpone reading the post. I could take the effort to look everything up of course but thought it would be more effective to read it in future.
Your post simply served as an example of how difficult it can be to read Less Wrong without a lot of background knowledge.
Not really. I actually wrote a basic explanation of the whole sentence concept by concept but trimmed it down to the part that best illustrated dependence on mathematical background. Saying “within cluster is basically a phrase in English that refers to the same thing that’s in the title of the post” wouldn’t have helped convey the point. :P
It does, however, illustrate a different point. There is a trait related not just to intelligence but also to openness to information and flexible thinking that makes some people more suited than others to picking up and following new topics and ideas based on what they already know and filling in the blanks with their best inference. Confidence is part of it but part of it is social competition strategy embodied at the cognitive level.
There isn’t an explicit mathematical concept of a cluster.
Here’s what K-means does. Say, K is 3.
You try all the possible ways to partition your data points into three groups. You pick the partition that minimizes the sum of squared differences within each group.
Then you iterate the procedure.
No, approximately the opposite of that. Are you sure you didn’t intend this to be a reply to Peter? It seems to be quite an odd reply to me in the context.
You said that you have been polite in what you previously wrote. I parsed that the way that you agree with Peter de Blanc but that you have chosen to communicate this fact in a way that makes it possible to arrive at the conclusion without stating it. In other words, I should have been able to understand the sentence.
I didn’t reply to Peter de Blanc because I don’t know him and he doesn’t know me and so his statement that he would have understood Y without X doesn’t give me much information regarding my own intelligence. But you have actually read a lot of my comments and addressed me directly in the discussion above.
Interestingly I’m having a discussion (see my previous comments) with Roko if one should tell people directly if they are dumb or try to communicate such a truth differently.
Note polite enough to lie but polite enough to leave off all the caveats and exceptions. Some here could, understand the sentence even with no education in mathematics. Even so, the essentials of what I said was sincere. Piecing together that kind of jargon from the scraps of information available in the context is a far harder task than just understanding the article itself.