It occurs to me that there is a possible source of bias in the approach I am taking:
perhaps not everyone gets into Swineboils, so that e.g. if we looked for correlations between the attributes we would get spurious negative correlations because people who are bad at everything don’t get in. If such effects are strong, then our model will have bias if we apply it to the population at large. That’s OK because we aren’t applying it to the population at large, we’re applying it to the students who got in … but if e.g. Swineboils is less selective than it used to be because there are way fewer applications, then the bias will distort our predictions for this year’s students. (This is the phenomenon sometimes called “Berkson’s paradox”.)
I don’t expect this bias to be very large, but I have made no attempt to check that expectation against reality. Still less have I made any attempt to correct for it, and probably I won’t.
It occurs to me that there is a possible source of bias in the approach I am taking:
perhaps not everyone gets into Swineboils, so that e.g. if we looked for correlations between the attributes we would get spurious negative correlations because people who are bad at everything don’t get in. If such effects are strong, then our model will have bias if we apply it to the population at large. That’s OK because we aren’t applying it to the population at large, we’re applying it to the students who got in … but if e.g. Swineboils is less selective than it used to be because there are way fewer applications, then the bias will distort our predictions for this year’s students.
(This is the phenomenon sometimes called “Berkson’s paradox”.)
I don’t expect this bias to be very large, but I have made no attempt to check that expectation against reality. Still less have I made any attempt to correct for it, and probably I won’t.