Robustness analysis: seeing how the above changes when we tweak various aspects of the algorithm.
Requiring Ofstev Rating at least 20 (fewer samples, less likely mis-sorted, might be some bias introduced if e.g. some houses have higher variance than others):
B shifts from Humblescrumble to Thought-Talon.
I shifts from Humblescrumble to Serpentyne.
K shifts from Dragonslayer to Serpentyne.
P shifts from Humblescrumble to Serpentyne.
Changing threshold year to 1800 (closer samples, more of them mis-sorted):
F ambiguously might shift from Serpentyne to Thought-Talon (5-5).
K shifts from Dragonslayer to Serpentyne.
P ambiguously might shift from Humblescrumble to Serpentyne (4-4-1-1)
Changing threshold year to 1600 (fewer samples, less likely mis-sorted):
F ambiguously might shift from Serpentyne to Thought-Talon (5-5).
K ambiguously might shift from Dragonslayer to Serpentyne (5-5).
P shifts from Humblescrumble to Serpentyne.
Increasing # of samples used to 20 (less risk of one of them being mis-sorted, but they are less good comparisons):
K shifts from Dragonslayer to Serpentyne (just barely, 10-9-1).
I’m not certain whether this will end up changing my views, but K in particular looks very close between Dragonslayer and Serpentyne, and P plausibly better in Serpentyne.
B indeed belongs in Th rather than Hu, but it’s close and not very clear. I belongs in Hu rather than Se according to all my models, but it’s close. My models disagree with one another about K, some preferring Dr narrowly and fewer preferring Se less narrowly. Most of my models put P in Hu not Se, and the ones that put it in Se are ones with larger errors. My models disagree with one another about F, preferring Se or Th and not expecting much difference between those.
(aphyer, I don’t know whether you would prefer me not to say such things in case you are tempted to read them. I will desist if you prefer. The approaches we’re taking are sufficiently different that I don’t think there is much actual harm in reading about one another’s results.)
No objection to you commenting. The main risk on my end is that my fundamental contrariness will lead me to disagree with you wherever possible, so if you do end up being right about everything you can lure me into being wrong just to disagree with you.
P is a very odd statblock, with huge Patience and incredibly low Courage and Integrity. (P-eter Pettigrew?) I might trust your models more than my approach on students like B, who have middle-of-the-road stats but happen to be sitting near a house boundary. I’m less sure how much I trust your models on extreme cases like P, and think there might be more benefit there to an approach that just looks at a dozen or so students with similar statblocks rather than trying to extrapolate a model out to those far values.
Based on poking at the score figures, I think I’m currently going to move student P from Humblescrumble to Serpentyne but not touch the other ambiguous ones:
Robustness analysis: seeing how the above changes when we tweak various aspects of the algorithm.
Requiring Ofstev Rating at least 20 (fewer samples, less likely mis-sorted, might be some bias introduced if e.g. some houses have higher variance than others):
B shifts from Humblescrumble to Thought-Talon.
I shifts from Humblescrumble to Serpentyne.
K shifts from Dragonslayer to Serpentyne.
P shifts from Humblescrumble to Serpentyne.
Changing threshold year to 1800 (closer samples, more of them mis-sorted):
F ambiguously might shift from Serpentyne to Thought-Talon (5-5).
K shifts from Dragonslayer to Serpentyne.
P ambiguously might shift from Humblescrumble to Serpentyne (4-4-1-1)
Changing threshold year to 1600 (fewer samples, less likely mis-sorted):
F ambiguously might shift from Serpentyne to Thought-Talon (5-5).
K ambiguously might shift from Dragonslayer to Serpentyne (5-5).
P shifts from Humblescrumble to Serpentyne.
Increasing # of samples used to 20 (less risk of one of them being mis-sorted, but they are less good comparisons):
K shifts from Dragonslayer to Serpentyne (just barely, 10-9-1).
I’m not certain whether this will end up changing my views, but K in particular looks very close between Dragonslayer and Serpentyne, and P plausibly better in Serpentyne.
According to my models
B indeed belongs in Th rather than Hu, but it’s close and not very clear. I belongs in Hu rather than Se according to all my models, but it’s close. My models disagree with one another about K, some preferring Dr narrowly and fewer preferring Se less narrowly. Most of my models put P in Hu not Se, and the ones that put it in Se are ones with larger errors. My models disagree with one another about F, preferring Se or Th and not expecting much difference between those.
(aphyer, I don’t know whether you would prefer me not to say such things in case you are tempted to read them. I will desist if you prefer. The approaches we’re taking are sufficiently different that I don’t think there is much actual harm in reading about one another’s results.)
No objection to you commenting. The main risk on my end is that my fundamental contrariness will lead me to disagree with you wherever possible, so if you do end up being right about everything you can lure me into being wrong just to disagree with you.
P is a very odd statblock, with huge Patience and incredibly low Courage and Integrity. (P-eter Pettigrew?) I might trust your models more than my approach on students like B, who have middle-of-the-road stats but happen to be sitting near a house boundary. I’m less sure how much I trust your models on extreme cases like P, and think there might be more benefit there to an approach that just looks at a dozen or so students with similar statblocks rather than trying to extrapolate a model out to those far values.
Based on poking at the score figures, I think I’m currently going to move student P from Humblescrumble to Serpentyne but not touch the other ambiguous ones:
Thought-Talon: A, J, O, S
Serpentyne: C, F, P
Dragonslayer: D, G, H, K, N, Q
Humblescrumble: B, E, I, L, M, R, T