I believe my ideas about Gibbs sampling are correct, as demonstrated by my correct choice and implementation of it to solve a difficult problem. My terminology may be non-standard.
Here is what I believe happened in that referenced exchange: You wrote a comment that was difficult to comprehend, and I didn’t see how it related to my question. I explained why I asked the question, hoping for clarification. That’s a failure to communicate, not a failure to update.
Here is what I believe happened in that referenced exchange: You wrote a comment that was difficult to comprehend, and I didn’t see how it related to my question. I explained why I asked the question, hoping for clarification. That’s a failure to communicate, not a failure to update.
My interpretation, having read this comment thread and then the original: Cyan brought up a subtle point about statistics, explained in a non-obvious way. (This comment seemed about as informative to me as the entire post.) You asked “don’t statistical procedures X and Y solve this problem?”, to which Cyan responded that they weren’t relevant, and then you repeated that they do.
Here, the takeaway I would make is that Cyan is likely a theory guy, and you’re likely an applications guy. (I got what I think Cyan’s point was on my first read, but it was a slow read and my “not my area of expertise” alarms were sounding.) It is evidence for overconfidence when people don’t know what they don’t know (heck, that might even be a good definition for overconfidence).
Say what I would have written differently if I were not overconfident.
After Cyan’s response that Gibbs and EM weren’t relevant, I would have written something like “If Gibbs and EM aren’t relevant to the ideas of this post, then I don’t think I understand the ideas of this post. Can you try to summarize those as clearly as possible?”
That’s a failure to communicate, not a failure to update.
Okay, fair enough. I’ll give it a shot, and then I’m bowing out.
Let me explain the problem with
That’s why these algorithms exist—they spare you from having to choose a prior, if the data is strong enough that the choice makes no difference.
This is not why these algorithms exist. EM isn’t really an algorithm per se; it’s a recipe for building an optimization algorithm for an objective function with the form given in equation 1.1 of the seminal paper on the topic. Likewise, Gibbs sampling is a recipe for constructing a certain type of Markov chain Monte Carlo algorithm for a given target distribution.
If you read the source material I’ve linked, you’ll notice that the EM paper gives many examples in which nothing like what you call a prior (actually a proportion) is present, e.g., sections 4.1.3, 4.6. Something like what you call priors are present in the example of section 4.3, although those models don’t really match the problem you solved. (To see why I brought up empirical Bayes in the context of your problem, read section 4.5.)
You’ll also notice that the Wikipedia article on MCMC does not mention priors in either your sense or my sense at all. That is because such notions only arise in specific applications; a true grokking of MCMC in general and Gibbs sampling in particular does not require the notion of a prior in either sense.
You’ve understood how to use the Gibbs sampling technology to solve a problem; that does not mean you understand the key ideas underlying the technology. Your problem was in the space of problems addressed by the technology, but that space is much larger, and the key ideas much more general, than you have as yet appreciated.
Not to be a jerk, but your ideas about Gibbs and EM seem very wrong to me too, for exactly the reasons that Cyan describes below.
Because of that, surprised that you said you had used Gibbs in a statistical application with great success. Perhaps you were using a stats package that used Gibbs sampling rather than being Gibbs sampling?.
I believe my ideas about Gibbs sampling are correct, as demonstrated by my correct choice and implementation of it to solve a difficult problem. My terminology may be non-standard.
Here is what I believe happened in that referenced exchange: You wrote a comment that was difficult to comprehend, and I didn’t see how it related to my question. I explained why I asked the question, hoping for clarification. That’s a failure to communicate, not a failure to update.
My interpretation, having read this comment thread and then the original: Cyan brought up a subtle point about statistics, explained in a non-obvious way. (This comment seemed about as informative to me as the entire post.) You asked “don’t statistical procedures X and Y solve this problem?”, to which Cyan responded that they weren’t relevant, and then you repeated that they do.
Here, the takeaway I would make is that Cyan is likely a theory guy, and you’re likely an applications guy. (I got what I think Cyan’s point was on my first read, but it was a slow read and my “not my area of expertise” alarms were sounding.) It is evidence for overconfidence when people don’t know what they don’t know (heck, that might even be a good definition for overconfidence).
After Cyan’s response that Gibbs and EM weren’t relevant, I would have written something like “If Gibbs and EM aren’t relevant to the ideas of this post, then I don’t think I understand the ideas of this post. Can you try to summarize those as clearly as possible?”
Okay, fair enough. I’ll give it a shot, and then I’m bowing out.
Let me explain the problem with
This is not why these algorithms exist. EM isn’t really an algorithm per se; it’s a recipe for building an optimization algorithm for an objective function with the form given in equation 1.1 of the seminal paper on the topic. Likewise, Gibbs sampling is a recipe for constructing a certain type of Markov chain Monte Carlo algorithm for a given target distribution.
If you read the source material I’ve linked, you’ll notice that the EM paper gives many examples in which nothing like what you call a prior (actually a proportion) is present, e.g., sections 4.1.3, 4.6. Something like what you call priors are present in the example of section 4.3, although those models don’t really match the problem you solved. (To see why I brought up empirical Bayes in the context of your problem, read section 4.5.)
You’ll also notice that the Wikipedia article on MCMC does not mention priors in either your sense or my sense at all. That is because such notions only arise in specific applications; a true grokking of MCMC in general and Gibbs sampling in particular does not require the notion of a prior in either sense.
You’ve understood how to use the Gibbs sampling technology to solve a problem; that does not mean you understand the key ideas underlying the technology. Your problem was in the space of problems addressed by the technology, but that space is much larger, and the key ideas much more general, than you have as yet appreciated.
Not to be a jerk, but your ideas about Gibbs and EM seem very wrong to me too, for exactly the reasons that Cyan describes below.
Because of that, surprised that you said you had used Gibbs in a statistical application with great success. Perhaps you were using a stats package that used Gibbs sampling rather than being Gibbs sampling?.