D&D.Sci: Whom Shall You Call?
This is a D&D.Sci scenario: a puzzle where players are given a dataset to analyze and an objective to pursue using information from that dataset.
Intended Difficulty: Average
Good news! After an interminable period of waiting for lawyers to finish arguing over the will, you’ve inherited your great-uncle’s sprawling country estate.
Bad news! As you might expect, the estate is super haunted. And from the results you get from applying your (perfectly reliable, standard-issue) Paranormal Phenomenon Evaluation Kit, several of the twenty-three(!) malevolent entities making the place unlivable seem like they’re on the more troublesome end of the spectral spectrum.
Good news! Your new grounds aren’t in the territory of any specific exorcist group, which means that – unlike the hoi polloi – you get to choose which of the six nearby organisations you want to hire to banish each creature, without running afoul of their intricate web of mutual non-compete agreements. While it’s a given that every exorcist eventually gets their ghost, being able to choose the cheapest still matters: your liquid assets currently total only 70000sp, and you’d really rather have some money left over for throwing lavish parties.
Bad news! Another thing all the exorcists agree on is not quoting prices in advance; apparently, the quantities of resources expended in exorcisms are sufficiently unpredictable that their practitioners uniformly insist on only naming their fees after they’ve done the work.
Good news! You’ve successfully pulled records for all the local exorcist groups detailing what PPEK results they got from each of their targets over the last few years, as well as how much they ended up charging for each job. And because things change so slowly in the exorcist world – and spirits in one territory tend to be no more (or differently) troublesome than those in another – you can be confident that trends deduced from these will apply to the horrors haunting your new home.
Bad news! In addition to their (literally and figuratively) arcane rules about territory and prices, several of the exorcist groups have all-too-human arbitrary constraints: the Spectre Slayers and the Entity Eliminators hate each other to the point that hiring one will cause the other to refuse to work for you, the Phantom Pummelers are too busy to perform more than three exorcisms for you before the start of the social season, and the Demon Destroyers are from far enough away that – unless you eschew using them at all – they’ll charge a one-time 400sp fee just for showing up.
Good news! Your noble blood and scientific acumen definitely[1] qualify you to select the best[2] banisher for each wayward spirit. Whom shall you call?
I’ll post an interactive you can use to test your choices, along with an explanation of how I generated the dataset, sometime on Wednesday 17th July. I’m giving you twelve days, but the task shouldn’t take more than an evening or two; use Excel, R, Python, Spider Senses, or whatever other tools you think are appropriate. Let me know in the comments if you have any questions about the scenario.
If you want to investigate collaboratively and/or call your choices in advance, feel free to do so in the comments; however, please use spoiler blocks or rot13 when sharing inferences/strategies/decisions, so people intending to fly solo can look for clarifications without being spoiled.
Thanks abstractapplic! Initial analysis:
Initial stuff that hasn’t turned out to be very important:
My immediate thought was that there are likely to be different types of entities we are classifying, so my initial approach was to look at the distributions to try to find clumps.
All of the 5 characteristics (Corporeality, Sliminess, Intellect, Hostility, Grotesqueness) have bimodal distributions with one peak around 15-30 (position varies) and the other peak at around 65-85 (position varies. Overall, the shapes are very similar looking. The trough between the peaks is not very deep, plenty of intermediate values.
All of these characteristics are correlated with each other.
Looking at sizes of bins for pairs of characteristics, again there appears to be two humps—but this time in the 2d plot only. That is, there is a high/high hump and a low/low hump, but noticeably there does not appear to be, for example, a high-sliminess peak when restricting to low-corporality data points.
Again, the shape varies a bit between characteristic pairs but overall looks very similar.
Adding all characteristics together gets a deeper trough between the peaks, though still no clean separation.
Overall, it looks to me like there are two types, one with high values of all characteristics, and another with low values of all characteristics, but I don’t see any clear evidence for any other groupings so far.
Eyeballing the plots, it looks compatible with no relation between characteristics other than the high/low groupings. Have not checked this with actual math.
In order to get a cleaner separation between the high/low types, I used the following procedure to get a probability estimate for each data point being in the high/low type:
For each characteristic, sum up all the other characteristics (rather, subtract that characteristic from the total)
For each characteristic, classify each data point into pretty clearly low (<100 total), pretty clearly high (>300 total) or unclear based on the sum of all the other characteristics
obtain frequency distribution for the characteristic values for the points classified clearly low and high using the above steps for each characteristic
smooth in ad hoc manner
obtain odds ratio from ratio of high and low distributions, ad hoc adjustment for distortions caused by ad hoc smoothing
multiply odds ratios obtained for each characteristic and obtain probability from odds ratio
I think this gives cleaner separation, but still not super great imo, most points 99%+ likely to be in one type or the other, but still 2057 (out of 34374) are between 0.1 and 0.9 in my ad hoc estimator. Todo: look for some function to fit to the frequency distributions and redo with the function instead of ad hoc approach.
Likely classifications of our mansion’s ghosts:
low: A,B,D,E,G,H,I,J,M,N,O,Q,S,U,V,W
high: C,F,K,L,P,R,T
To actually solve the problem: I now proceeded to split the data based on exorcist group. Expecting high/low type to be relevant, I split the DD points by likely type (50% cutoff), and then tried some stuff for DD low including a linear regression. Did a couple graphs on the characteristics that seemed to matter (grotesqueness and hostility in this case) to confirm effects looked linear. So, then tried linear regression for DD high and got the same coefficients, within error bars. So then I thought, if it’s the same linear coefficients in both cases, I probably could have gotten them from the combined data for DD, don’t need to separate into high and low, and indeed linear regression on the combined DD data gave the same coefficients more or less.
Actually finding the answer:
So, then I did regression for the exorcist groups without splitting based on high/low type. (I did split after to check whether it mattered)
Results:
DD cost depends on Grotesqueness and to a lesser extent Hostility.
EE cost depends on all characteristics slightly, Sliminess then Intellect/Grotesqueness being the most important. Note: Grotesqueness less important, perhaps zero effect, for “high” type.
MM cost actually very slightly declines for higher values of all characteristics. (note: less effect for “high” type, possibly zero effect)
PP cost depends mainly on Sliminess. However, slight decline in cost with more Corporeality and increase with more of everything else.
SS cost depends primarily on Intellect. However, slight decline with Hostility and increase with Sliminess.
WW cost depends primarily on Hostility. However, everything else also has at least a slight effect, especially Sliminess and Grotesqueness.
Provisionally, I’m OK with just using the linear regression coefficients without the high/low split, though I will want to verify later if this was causing a problem (also need to verify linearity, only checked for DD low (and only for Grotesqueness and Hostility separately, not both together)).
Results:
Ghost | group with lowest estimate | estimated cost for that group
A | Spectre Slayers | 1926.301885259
B | Wraith Wranglers | 1929.72034133793
C | Mundanifying Mystics | 2862.35739392631
D | Demon Destroyers | 1807.30638053037 (next lowest: Wraith Wranglers, 1951.91410462716)
E | Wraith Wranglers | 2154.47901124028
F | Mundanifying Mystics | 2842.62070661731
G | Demon Destroyers | 1352.86163670857 (next lowest: Phantom Pummelers, 1688.45809434935)
H | Phantom Pummelers | 1923.30132492753
I | Wraith Wranglers | 2125.87216703498
J | Demon Destroyers | 1915.0299245701 (Next lowest: Wraith Wranglers, 2162.49691339282)
K | Mundanifying Mystics | 2842.16499046146
L | Mundanifying Mystics | 2783.55221244497
M | Spectre Slayers | 1849.71986735069
N | Phantom Pummelers | 1784.8259008802
O | Wraith Wranglers | 2269.45361189797
P | Mundanifying Mystics | 2775.89249612121
Q | Wraith Wranglers | 1748.56167086623
R | Mundanifying Mystics | 2940.5652346428
S | Spectre Slayers | 1666.64380523907
T | Mundanifying Mystics | 2821.89307084084
U | Phantom Pummelers | 1792.3319145455
V | Demon Destroyers | 1472.45641559628 (Next lowest: Spectre Slayers, 1670.68911559919)
W | Demon Destroyers | 1833.86462523462 (Next lowest: Wraith Wranglers, 2229.1901870478)
So that’s my provisional solution, and I will pay the extra 400sp one time fee so that Demon Destroyers can deal with ghosts D, G, J, V, W.
--Edit: whoops, missed most of this paragraph (other than the Demon Destroyers):
“Bad news! In addition to their (literally and figuratively) arcane rules about territory and prices, several of the exorcist groups have all-too-human arbitrary constraints: the Spectre Slayers and the Entity Eliminators hate each other to the point that hiring one will cause the other to refuse to work for you, the Poltergeist Pummelers are too busy to perform more than three exorcisms for you before the start of the social season, and the Demon Destroyers are from far enough away that – unless you eschew using them at all – they’ll charge a one-time 400sp fee just for showing up.”
will edit to fix! post edit: Actually my initial result is still compatible with that paragraph, it doesn’t involve the Entity Eliminators, and only uses the Phantom Pummelers 3 times. --
Not very confident in my solution (see things to verify above), and if it is indeed this simple it is an easier problem than I expected.
further edit (late July 15 2024): haven’t gotten around to checking those things and also my check of linearity, where I did check, binned the data and could be hiding all sorts of patterns.
Things about the dataset:
Each ghost statistic has a bimodal distribution, with one peak ~70 for ‘high’ stats and one ~30 for ‘low’ stats.
High stats correlate with other high stats: many ghosts have either all stats high or all stats low. This suggests a distinction between e.g. ‘Major’ spirits (which tend to have all stats high, but sometimes have a few low) and Minor spirits (vice versa).
Sliminess seems to be the stat most correlated with major/minor-ness: almost all Major spirits have high Sliminess, and almost all Minor spirits have low Sliminess. Hostility is the least correlated: Hostile Minor spirits, or non-Hostile Major spirits, both happen relatively often.
However, I haven’t yet been able to come up with anything clever to do with this, and ended up mostly just using a linear regression.
Results of my analysis:
Most exorcists have one particular ghost stat that seems to primarily govern the difficulty they face:
The Phantom Pummelers really do not like Sliminess.
The Spectre Slayers really do not like Intellect.
The Wraith Wranglers really do not like Hostility.
The Demon Destroyers really do not like Grotesqueness (and also do better with low Hostility).
while some behave differently:
The Entity Eliminators seems to dislike all stats, especially Sliminess: perhaps they have a hard time with Major spirits and a relatively easy time with Minor ones?
The Mundanifying Mystics have a very high base rate, but actually charge slightly less for all stats—they are expensive in general, and get extra annoyed when you waste their time with Minor spirits?
We handle the idiosyncracies of hiring the various exorcists:
Paying the Demon Destroyers to come seems worth it: they might actually save us 400sp in expectation just on spirit W alone.
The Spectre Slayers seem more valuable than the Entity Eliminators: while the Eliminators are all-around okay at minor spirits, with our knowledge of who is good against which stats we can always pick out a better exorcist to use, while the Spectre Slayers are a uniquely good bet for spirits like S that have very low INT but high other stats.
There are exactly three spirits where I think the Pummelers save us money (N, U, and a little bit on H), so we don’t need to fret about that constraint.
And we end up assigning (unless I find something else to do and change this):
A: Spectre Slayers
B: Wraith Wranglers
C: Mundanifying Mystics
D: Demon Destroyers
E: Wraith Wranglers
F: Mundanifying Mystics
G: Demon Destroyers
H: Phantom Pummelers
I: Wraith Wranglers
J: Demon Destroyers
K: Mundanifying Mystics
L: Mundanifying Mystics
M: Spectre Slayers
N: Phantom Pummelers
O: Wraith Wranglers
P: Mundanifying Mystics
Q: Wraith Wranglers
R: Mundanifying Mystics
S: Spectre Slayers
T: Mundanifying Mystics
U: Phantom Pummelers
V: Demon Destroyers
W: Demon Destroyers
Edit after seeing simon’s answer:
We appear to have done pretty much the exact same things—identified the major/minor spirit distinction, not found anything to do with it, just fed the stats into a linear regression—and gotten the exact same answer.
(And I wonder which ghost your great-uncle is...perhaps we can get away with sending no exorcist at all to that one?)
I think there’s a typo; the text refers to “Poltergeist Pummelers” but the input data says “Phantom Pummelers”.
My first pass was just to build a linear model for each exorcist based on the cases where they were hired, and assign each ghost the minimum cost exorcist according to the model. This happens to obey all the constraints, so no further adjustment is needed
My main concern with this is that the linear model is terrible (r2 of 0.12) for the “Mundanifying Mystics”. It’s somewhat surprising (but convenient!) that we never choose the Entity Eliminators.
A: Spectre Slayers (1926)
B: Wraith Wranglers (1930)
C: Mundanifying Mystics (2862)
D: Demon Destroyers (1807)
E: Wraith Wranglers (2154)
F: Mundanifying Mystics (2843)
G: Demon Destroyers (1353)
H: Phantom Pummelers (1923)
I: Wraith Wranglers (2126)
J: Demon Destroyers (1915)
K: Mundanifying Mystics (2842)
L: Mundanifying Mystics (2784)
M: Spectre Slayers (1850)
N: Phantom Pummelers (1785)
O: Wraith Wranglers (2269)
P: Mundanifying Mystics (2776)
Q: Wraith Wranglers (1749)
R: Mundanifying Mystics (2941)
S: Spectre Slayers (1667)
T: Mundanifying Mystics (2822)
U: Phantom Pummelers (1792)
V: Demon Destroyers (1472)
W: Demon Destroyers (1834)
Estimated total cost: 49822
Good catch; fixed now; thank you.
In general there seems to be a weak correlation between each of the attributes and costs,both when you consider every exorcist and the individual group.
- For corporeality the mystics seem to have a weak negative linear correlation. The rest look like they have a weak step function
- For sliminess the mystics seem to have a weak negative linear correlation. All except the Pummelers (Which look linear) seem to have a weak step function.
- For intellect the Mystics again appear to have a weak linear correlation. The Slayers have a linear positive correlation, the rest seem to have a weak positive correlation.
- For hostility the Mystics have a weak negative linear function, the Wraiths have a positive linear one, the rest appear to have a positive step function.
- For grotesueness the Mystic have a weak negative correlation, the destroyers have a positive linear correaltion, the rest seem to have a weak positive step function.
So one is always negative linear. One is a positive step function for everything, the other 4 are positive linear in one, but have a step function for the rest.
Fitting a model to this leaves the best results as:
A: Entity Eliminators (1737)
B: Spectre Slayers (1999)
C: Mundanifying Mystics (2862)
D: Entity Eliminators (1737)
E: Wraith Wranglers (1747)
F: Mundanifying Mystics (2842)
G: Demon Destroyers (1459)
H: Phantom Pummelers (1804)
I: Wraith Wranglers (1961)
J: Wraith Wranglers (1934)
K: Mundanifying Mystics (2842)
L: Mundanifying Mystics (2783)
M: Spectre Slayers (1857)
N: Phantom Pummelers (1778)
O: Wraith Wranglers (1747)
P: Mundanifying Mystics (2775)
Q: Wraith Wranglers (1513)
R: Mundanifying Mystics (2940)
S: Spectre Slayers (1686)
T: Mundanifying Mystics (2821)
U: Phantom Pummelers (1756)
V: Demon Destroyers (1567)
W: Demon Destroyers (1942)
Total Cost: 48089
But unfortunately we can’t have them all due to anoying guild rules.
When no slayers are present total cost is 48737
When no eliminators are present total cost is 48444
Max 3 pummeller restriction isn’t relevant yet, but destroyers are used. Re-running this when destroyers aren’t present yields the following results:
When no slayers or destroyers are present the total cost is 49934
When no eliminators or destroyers are present the total cost is 49162
So the money saved by the destroyer call out fee is worth it in both cases, and we don’t need to worry
about the pummeler restriction.
based on this model the best choice is therefore:
A: Spectre Slayers (1942)
B: Spectre Slayers (1999)
C: Mundanifying Mystics (2862)
D: Wraith Wranglers (1887)
E: Wraith Wranglers (1747)
F: Mundanifying Mystics (2842)
G: Demon Destroyers (1459)
H: Phantom Pummelers (1804)
I: Wraith Wranglers (1961)
J: Wraith Wranglers (1934)
K: Mundanifying Mystics (2842)
L: Mundanifying Mystics (2783)
M: Spectre Slayers (1857)
N: Phantom Pummelers (1778)
O: Wraith Wranglers (1747)
P: Mundanifying Mystics (2775)
Q: Wraith Wranglers (1513)
R: Mundanifying Mystics (2940)
S: Spectre Slayers (1686)
T: Mundanifying Mystics (2821)
U: Phantom Pummelers (1756)
V: Demon Destroyers (1567)
W: Demon Destroyers (1942)
Total cost is estimated to be 48444
There is still quite a bit of variation, and I don’t think all of this is random, and if I was doing this for real I would definitiely analyse the data further to svae more money. As it is lack of time means that I will go with the above for my entry.