Had trouble making further progress using that method, realized I was being silly about this and there was a much easier starting solution:
Rather than trying to figure out anything whatsoever about scores, we’re trying for now just to mimic what we did in the past.
Define a metric of ‘distance’ between two people equal to the sum of the absolute values of the differences between their stats.
To evaluate a person:
Find the 10* students with the smallest distances from them who were sorted pre-1700*
Assume that those students were similar to them, and were sorted correctly. Sort them however the majority were sorted.
*these numbers may be varied to optimize. For example, moving the year threshold earlier makes you more certain that the students you find were correctly sorted...at the expense of making them be selected from a smaller population and so be further away from the person you’re evaluating. I may twiddle these number in future and see if I can do better.
We can test this algorithm by trying it on the students from 1511 (and using students from 1512-1699 to find close matches). When we do this:
49 students are sorted by this method into the same house we sorted them into in 1511.
3 students are ambiguous (e.g. we see a 5-5 split among the 10 closest students, one of which is the house we chose).
8 students are sorted differently.
Some of these are very dramatically different. For example, student 37 had Intellect 7 and Integrity 61. All students with stats even vaguely near that were sorted into Humblescrumble, which makes sense given that house’s focus on Integrity. However, Student 37 was sorted into Thought-Talon, which seems very odd given their extremely low Intellect.
The most likely explanation for this is that our sorting wasn’t perfect even in 1511. Student 37 did quite badly, which suggests this is plausible.
The less likely but scarier explanation is that our sorting in 1511 was based on something other than stats (a hidden stat that we can no longer see? Cohort effects?)
Sadly this method provides no insight whatsoever into the underlying world. We’re copying what we did in the past, but we’re not actually learning anything. I still think it’s better than any explicit model I’ve build so far.
This gives the following current allocations for our students (still subject to future meddling):
Thought-Talon: A, J, O, S
Serpentyne: C, F*
Dragonslayer: D, H, G*, K*, N*, Q*
Humblescrumble: B*, E*, I, L, M*, P*, R, T
where entries marked with a * are those where the nearby students were a somewhat close split, while those without are those where the nearby students were clearly almost all in the same house.
And some questions for the GM based on something I ran into doing this (if you think these are questions you’re not comfortable answering that’s fine, but if they were meant to be clear one way or the other from the prompt please let me know):
The problem statement says we were ‘impressively competent’ at assigning students when first enchanted.
Should we take this to mean we were perfect, or should we take this to mean that we were fairly good but could possibly be even better?
When first enchanted, did we definitely still only use the five stats specified here to classify students, or is it possible that we were able to identify an additional stat (Fated? Protagonist-hood?) that we can no longer perceive, and sorted students based on that?
Robustness analysis: seeing how the above changes when we tweak various aspects of the algorithm.
Requiring Ofstev Rating at least 20 (fewer samples, less likely mis-sorted, might be some bias introduced if e.g. some houses have higher variance than others):
B shifts from Humblescrumble to Thought-Talon.
I shifts from Humblescrumble to Serpentyne.
K shifts from Dragonslayer to Serpentyne.
P shifts from Humblescrumble to Serpentyne.
Changing threshold year to 1800 (closer samples, more of them mis-sorted):
F ambiguously might shift from Serpentyne to Thought-Talon (5-5).
K shifts from Dragonslayer to Serpentyne.
P ambiguously might shift from Humblescrumble to Serpentyne (4-4-1-1)
Changing threshold year to 1600 (fewer samples, less likely mis-sorted):
F ambiguously might shift from Serpentyne to Thought-Talon (5-5).
K ambiguously might shift from Dragonslayer to Serpentyne (5-5).
P shifts from Humblescrumble to Serpentyne.
Increasing # of samples used to 20 (less risk of one of them being mis-sorted, but they are less good comparisons):
K shifts from Dragonslayer to Serpentyne (just barely, 10-9-1).
I’m not certain whether this will end up changing my views, but K in particular looks very close between Dragonslayer and Serpentyne, and P plausibly better in Serpentyne.
B indeed belongs in Th rather than Hu, but it’s close and not very clear. I belongs in Hu rather than Se according to all my models, but it’s close. My models disagree with one another about K, some preferring Dr narrowly and fewer preferring Se less narrowly. Most of my models put P in Hu not Se, and the ones that put it in Se are ones with larger errors. My models disagree with one another about F, preferring Se or Th and not expecting much difference between those.
(aphyer, I don’t know whether you would prefer me not to say such things in case you are tempted to read them. I will desist if you prefer. The approaches we’re taking are sufficiently different that I don’t think there is much actual harm in reading about one another’s results.)
No objection to you commenting. The main risk on my end is that my fundamental contrariness will lead me to disagree with you wherever possible, so if you do end up being right about everything you can lure me into being wrong just to disagree with you.
P is a very odd statblock, with huge Patience and incredibly low Courage and Integrity. (P-eter Pettigrew?) I might trust your models more than my approach on students like B, who have middle-of-the-road stats but happen to be sitting near a house boundary. I’m less sure how much I trust your models on extreme cases like P, and think there might be more benefit there to an approach that just looks at a dozen or so students with similar statblocks rather than trying to extrapolate a model out to those far values.
Based on poking at the score figures, I think I’m currently going to move student P from Humblescrumble to Serpentyne but not touch the other ambiguous ones:
I remark that (note: not much spoilage here, but a little)
your allocations are very similar to mine, even though my approach was quite different; maybe this kinda-validates what both of us are doing. Ignoring the missing student G, I think we disagree only about B, and neither of us was very sure about B.
Had trouble making further progress using that method, realized I was being silly about this and there was a much easier starting solution:
Rather than trying to figure out anything whatsoever about scores, we’re trying for now just to mimic what we did in the past.
Define a metric of ‘distance’ between two people equal to the sum of the absolute values of the differences between their stats.
To evaluate a person:
Find the 10* students with the smallest distances from them who were sorted pre-1700*
Assume that those students were similar to them, and were sorted correctly. Sort them however the majority were sorted.
*these numbers may be varied to optimize. For example, moving the year threshold earlier makes you more certain that the students you find were correctly sorted...at the expense of making them be selected from a smaller population and so be further away from the person you’re evaluating. I may twiddle these number in future and see if I can do better.
We can test this algorithm by trying it on the students from 1511 (and using students from 1512-1699 to find close matches). When we do this:
49 students are sorted by this method into the same house we sorted them into in 1511.
3 students are ambiguous (e.g. we see a 5-5 split among the 10 closest students, one of which is the house we chose).
8 students are sorted differently.
Some of these are very dramatically different. For example, student 37 had Intellect 7 and Integrity 61. All students with stats even vaguely near that were sorted into Humblescrumble, which makes sense given that house’s focus on Integrity. However, Student 37 was sorted into Thought-Talon, which seems very odd given their extremely low Intellect.
The most likely explanation for this is that our sorting wasn’t perfect even in 1511. Student 37 did quite badly, which suggests this is plausible.
The less likely but scarier explanation is that our sorting in 1511 was based on something other than stats (a hidden stat that we can no longer see? Cohort effects?)
Sadly this method provides no insight whatsoever into the underlying world. We’re copying what we did in the past, but we’re not actually learning anything. I still think it’s better than any explicit model I’ve build so far.
This gives the following current allocations for our students (still subject to future meddling):
Thought-Talon: A, J, O, S
Serpentyne: C, F*
Dragonslayer: D, H, G*, K*, N*, Q*
Humblescrumble: B*, E*, I, L, M*, P*, R, T
where entries marked with a * are those where the nearby students were a somewhat close split, while those without are those where the nearby students were clearly almost all in the same house.
And some questions for the GM based on something I ran into doing this (if you think these are questions you’re not comfortable answering that’s fine, but if they were meant to be clear one way or the other from the prompt please let me know):
The problem statement says we were ‘impressively competent’ at assigning students when first enchanted.
Should we take this to mean we were perfect, or should we take this to mean that we were fairly good but could possibly be even better?
When first enchanted, did we definitely still only use the five stats specified here to classify students, or is it possible that we were able to identify an additional stat (
Fated? Protagonist-hood?) that we can no longer perceive, and sorted students based on that?Robustness analysis: seeing how the above changes when we tweak various aspects of the algorithm.
Requiring Ofstev Rating at least 20 (fewer samples, less likely mis-sorted, might be some bias introduced if e.g. some houses have higher variance than others):
B shifts from Humblescrumble to Thought-Talon.
I shifts from Humblescrumble to Serpentyne.
K shifts from Dragonslayer to Serpentyne.
P shifts from Humblescrumble to Serpentyne.
Changing threshold year to 1800 (closer samples, more of them mis-sorted):
F ambiguously might shift from Serpentyne to Thought-Talon (5-5).
K shifts from Dragonslayer to Serpentyne.
P ambiguously might shift from Humblescrumble to Serpentyne (4-4-1-1)
Changing threshold year to 1600 (fewer samples, less likely mis-sorted):
F ambiguously might shift from Serpentyne to Thought-Talon (5-5).
K ambiguously might shift from Dragonslayer to Serpentyne (5-5).
P shifts from Humblescrumble to Serpentyne.
Increasing # of samples used to 20 (less risk of one of them being mis-sorted, but they are less good comparisons):
K shifts from Dragonslayer to Serpentyne (just barely, 10-9-1).
I’m not certain whether this will end up changing my views, but K in particular looks very close between Dragonslayer and Serpentyne, and P plausibly better in Serpentyne.
According to my models
B indeed belongs in Th rather than Hu, but it’s close and not very clear. I belongs in Hu rather than Se according to all my models, but it’s close. My models disagree with one another about K, some preferring Dr narrowly and fewer preferring Se less narrowly. Most of my models put P in Hu not Se, and the ones that put it in Se are ones with larger errors. My models disagree with one another about F, preferring Se or Th and not expecting much difference between those.
(aphyer, I don’t know whether you would prefer me not to say such things in case you are tempted to read them. I will desist if you prefer. The approaches we’re taking are sufficiently different that I don’t think there is much actual harm in reading about one another’s results.)
No objection to you commenting. The main risk on my end is that my fundamental contrariness will lead me to disagree with you wherever possible, so if you do end up being right about everything you can lure me into being wrong just to disagree with you.
P is a very odd statblock, with huge Patience and incredibly low Courage and Integrity. (P-eter Pettigrew?) I might trust your models more than my approach on students like B, who have middle-of-the-road stats but happen to be sitting near a house boundary. I’m less sure how much I trust your models on extreme cases like P, and think there might be more benefit there to an approach that just looks at a dozen or so students with similar statblocks rather than trying to extrapolate a model out to those far values.
Based on poking at the score figures, I think I’m currently going to move student P from Humblescrumble to Serpentyne but not touch the other ambiguous ones:
Thought-Talon: A, J, O, S
Serpentyne: C, F, P
Dragonslayer: D, G, H, K, N, Q
Humblescrumble: B, E, I, L, M, R, T
You haven’t sorted student G.
I remark that (note: not much spoilage here, but a little)
your allocations are very similar to mine, even though my approach was quite different; maybe this kinda-validates what both of us are doing. Ignoring the missing student G, I think we disagree only about B, and neither of us was very sure about B.
Good catch, fixed.
With that fix
student B is indeed the only one we (both unconfidently) disagree on.