Your updating method is more properly called “Naive Bayes”. Naive Bayes works well if the events you are conditioning on (Tj take Yj on I) are independent. It’s not clear how reasonable that assumption is for your application.
To see the problem, let’s say I agree with 80% probability with Eliezer and Robin. Now consider some new issue on which both agree. What’s the probability I also agree? Naive Bayes predicts much higher than 80%; it effectively double-updates, one for both influences. But that’s probably wrong—since Eliezer’s opinions overlap strongly with Robin’s, the conditioning events are not independent.
To do better, you need a method that can capture this kind of conditional independence effect (my opinion is probably independent of Eliezer’s given Robin’s). I would try a boosting method like AdaBoost.
As a meta-level note, you shouldn’t imagine that any one technique is “correct” and some other incorrect. The only criterion you should use is empirical performance. Specifically, you can calculate the negative log likelihood (compression rate) each method achieves for the database and select the one that provides the lowest number.
One way of looking at this problem, that fits well with an expansion I am trying to figure out, is that when I know Robin’s position on the question, I should update my probability distribution for Eliezer’s position, which should affect the likelihood ratios for Eliezer’s observed position with respect to the positions of the person whose opinion is being predicted.
Your updating method is more properly called “Naive Bayes”. Naive Bayes works well if the events you are conditioning on (Tj take Yj on I) are independent. It’s not clear how reasonable that assumption is for your application.
To see the problem, let’s say I agree with 80% probability with Eliezer and Robin. Now consider some new issue on which both agree. What’s the probability I also agree? Naive Bayes predicts much higher than 80%; it effectively double-updates, one for both influences. But that’s probably wrong—since Eliezer’s opinions overlap strongly with Robin’s, the conditioning events are not independent.
To do better, you need a method that can capture this kind of conditional independence effect (my opinion is probably independent of Eliezer’s given Robin’s). I would try a boosting method like AdaBoost.
As a meta-level note, you shouldn’t imagine that any one technique is “correct” and some other incorrect. The only criterion you should use is empirical performance. Specifically, you can calculate the negative log likelihood (compression rate) each method achieves for the database and select the one that provides the lowest number.
One way of looking at this problem, that fits well with an expansion I am trying to figure out, is that when I know Robin’s position on the question, I should update my probability distribution for Eliezer’s position, which should affect the likelihood ratios for Eliezer’s observed position with respect to the positions of the person whose opinion is being predicted.