I was wondering if anyone would notice that Network 2 with logistic units was exactly equivalent to Naive Bayes.
To be precise, Naive Bayes assumes that within the blegg cluster, or within the rube cluster, all remaining variance in the characteristics is independent; or to put it another way, once we know whether an object is a blegg or a rube, this screens off any other information that its shape could tell us about its color. This isn’t the same as assuming that the only causal influence on a blegg’s shape is its blegg-ness—in fact, there may not be anything that corresponds to blegg-ness.
But one reason that Naive Bayes does work pretty well in practice, is that a lot of objects in the real world do have causal essences, like the way that cat DNA (which doesn’t mix with dog DNA) is the causal essence that gives rise to all the surface characteristics that distinguish cats from dogs.
The other reason Naive Bayes works pretty well in practice is that it often successfully chops up a probability distribution into clusters even when the real causal structure looks nothing like a central influence.
I was wondering if anyone would notice that Network 2 with logistic units was exactly equivalent to Naive Bayes.
To be precise, Naive Bayes assumes that within the blegg cluster, or within the rube cluster, all remaining variance in the characteristics is independent; or to put it another way, once we know whether an object is a blegg or a rube, this screens off any other information that its shape could tell us about its color. This isn’t the same as assuming that the only causal influence on a blegg’s shape is its blegg-ness—in fact, there may not be anything that corresponds to blegg-ness.
But one reason that Naive Bayes does work pretty well in practice, is that a lot of objects in the real world do have causal essences, like the way that cat DNA (which doesn’t mix with dog DNA) is the causal essence that gives rise to all the surface characteristics that distinguish cats from dogs.
The other reason Naive Bayes works pretty well in practice is that it often successfully chops up a probability distribution into clusters even when the real causal structure looks nothing like a central influence.