Continuing with the principle “when in doubt, use brute force”,
I did the same thing with gradient-boosted trees; these had somewhat more prediction error on each validation set (oh, I forgot to mention that each regressor was trained on 90% of the data and evaluated on the remaining 10%). And with SVMs using radial basis functions; these were comparable in accuracy to the random forests. (Note: There’s much less diversity in my ensemble of SVMs, because the only difference between them is the training/validation split, whereas for RFs and GBTs there is randomness in the fitting process itself.)
Did this make a difference to my predictions or suggestions?
Not much; usually all three agreed; where they didn’t, usually the SVM agreed with the RF. However, the SVM regressors fairly confidently want to put K in Dragonslayer, and the GBT ones less confidently agree. On the other hand, they predict less loss from putting K in the RFs’ suggestion of Serpentyne than the RFs do from putting K in Dragonslayer, so it’s a tough call. I’ll switch to putting K in Dragonslayer. And for Q, the RFs and GBTs are fairly indifferent between all houses and slightly prefer Humblescrumble (for the RFs) and Thought-Talon (for the GBTs), but the SVMs think that Dragonslayer and Thought-Talon are much better than the other two, and give the nod to Dragonslayer. Looking at all their numbers, I’ll move Q into Dragonslayer.
So my revised allocations are:
Serpentyne gets C, F. Humblescrumble gets E, I, L, M, P, R, T. Dragonslayer gets D, G, H, K, N, Q. Thought-Talon gets A, B, J, O, S.
And my revised predicted ratings with these allocations are:
A 36..39, B 15..18, C 28..30, D 17..20, E 17..19, F 33..40, G 21..24, H 15..20, I 12..15, J 24..30, K 23..26, L 21..26, M 24..29, N 19..21, O 18..23, P 22..25, Q 27..34, R 40..44, S 29..32, T 28..34.
It occurs to me that there is a possible source of bias in the approach I am taking:
perhaps not everyone gets into Swineboils, so that e.g. if we looked for correlations between the attributes we would get spurious negative correlations because people who are bad at everything don’t get in. If such effects are strong, then our model will have bias if we apply it to the population at large. That’s OK because we aren’t applying it to the population at large, we’re applying it to the students who got in … but if e.g. Swineboils is less selective than it used to be because there are way fewer applications, then the bias will distort our predictions for this year’s students. (This is the phenomenon sometimes called “Berkson’s paradox”.)
I don’t expect this bias to be very large, but I have made no attempt to check that expectation against reality. Still less have I made any attempt to correct for it, and probably I won’t.
Continuing with the principle “when in doubt, use brute force”,
I did the same thing with gradient-boosted trees; these had somewhat more prediction error on each validation set (oh, I forgot to mention that each regressor was trained on 90% of the data and evaluated on the remaining 10%). And with SVMs using radial basis functions; these were comparable in accuracy to the random forests. (Note: There’s much less diversity in my ensemble of SVMs, because the only difference between them is the training/validation split, whereas for RFs and GBTs there is randomness in the fitting process itself.)
Did this make a difference to my predictions or suggestions?
Not much; usually all three agreed; where they didn’t, usually the SVM agreed with the RF. However, the SVM regressors fairly confidently want to put K in Dragonslayer, and the GBT ones less confidently agree. On the other hand, they predict less loss from putting K in the RFs’ suggestion of Serpentyne than the RFs do from putting K in Dragonslayer, so it’s a tough call. I’ll switch to putting K in Dragonslayer. And for Q, the RFs and GBTs are fairly indifferent between all houses and slightly prefer Humblescrumble (for the RFs) and Thought-Talon (for the GBTs), but the SVMs think that Dragonslayer and Thought-Talon are much better than the other two, and give the nod to Dragonslayer. Looking at all their numbers, I’ll move Q into Dragonslayer.
So my revised allocations are:
Serpentyne gets C, F. Humblescrumble gets E, I, L, M, P, R, T. Dragonslayer gets D, G, H, K, N, Q. Thought-Talon gets A, B, J, O, S.
And my revised predicted ratings with these allocations are:
A 36..39, B 15..18, C 28..30, D 17..20, E 17..19, F 33..40, G 21..24, H 15..20, I 12..15, J 24..30, K 23..26, L 21..26, M 24..29, N 19..21, O 18..23, P 22..25, Q 27..34, R 40..44, S 29..32, T 28..34.
It occurs to me that there is a possible source of bias in the approach I am taking:
perhaps not everyone gets into Swineboils, so that e.g. if we looked for correlations between the attributes we would get spurious negative correlations because people who are bad at everything don’t get in. If such effects are strong, then our model will have bias if we apply it to the population at large. That’s OK because we aren’t applying it to the population at large, we’re applying it to the students who got in … but if e.g. Swineboils is less selective than it used to be because there are way fewer applications, then the bias will distort our predictions for this year’s students.
(This is the phenomenon sometimes called “Berkson’s paradox”.)
I don’t expect this bias to be very large, but I have made no attempt to check that expectation against reality. Still less have I made any attempt to correct for it, and probably I won’t.