“Mu”: Japanese word roughly translateable as ‘absence’.
“Kami”: Japanese word roughly translateable as ‘god’.
“-sama”: Japanese honorific for referring to someone whose status/position is much higher than yours.
“Mu”: Japanese word roughly translateable as ‘absence’.
“Kami”: Japanese word roughly translateable as ‘god’.
“-sama”: Japanese honorific for referring to someone whose status/position is much higher than yours.
I’m curious as to what exactly you found there.
Briefly: I told my learner “assume there are two sources of income for Light Forest forts; assume they are log-linked functions of the data provided with no interactions between features; characterize these income sources.”
The output graphs, properly interpreted, said back:
The larger source of income benefits greatly from Miners, benefits from the presence of every ore (especially Haematite), likes coal, and benefits from having one Smith.
The smaller source of income benefits from Woodcutters, benefits from having two (but not more) Warriors, hard-requires at least one Woodcutter and Warrior in order to be viable, actively dislikes Coal, doesn’t care about ores (except Copper for some reason), and strongly benefits from Crafters.
(In reviewing my graphs in retrospect I also see a small bump in performance for both sources associated with having exactly one Brewer; I missed that the first time because it looked like noise and I’d assumed Brewers only mattered to the survival half of the challenge.)
This wasn’t 100% right, and missed some important detail, but given the bad assumptions I built it on—an additive model with a lot of interactions sprinkled on top would have been a better match—I’m pleasantly surprised by how closely it matches (a valid interpretation of) ground truth.
I didn’t like this post, but I did very much like the “insight porn” post it linked to. (Unfortunately LW doesn’t let you simultaneously downvote and strong-upvote a post, so consider my weak-upvote as a sum-of-vibes.)
If someone says ‘What’s for supper?’ a beginner will desperately try to think up something original. He will carefully evaluate dozens of options in his mind.
“Is this funny?” “Will this not reveal something weird about myself?”
It will take him ages to come up with something and eventually he will say something “fried mermaid”.
An improv pro would simply respond “fish”.
Taken—almost verbatim, without attribution—from Impro, by Keith Johnstone. (I don’t know whether LW would consider this plagiarism, or consider that to be bad.)
I haven’t eaten meat in months.
Completely orthogonal to any of the more interesting points you were trying to make, but: it looks like you might be going vegan in an unsystematic way. I heard this gives people severe permanent disabilities, in ways that are trivial to dodge once you know what they are. (I realize you’ve probably already addressed this, but thought I’d err on the side of caution and nag you anyway.)
Findings:
As noted by others, there are weird everything-but-infohazardous rows that don’t seem to make sense with the rest of the data (and sometimes just don’t make sense in general, such as by being acquired without a team being sent). I filtered these out.
Attempts by Paramilitary units to capture Virtual assets NEVER work.
(once you account for the above, success rates for Paramil/Infil/Legal teams start looking eerily similar)
My predecessors never sent a Legal team to capture a Humanoid, or an Infiltration team to capture a Location. This makes intuitive sense, so I’ll follow their lead.
Infiltration teams work best in Sites 2 and 6; Paramilitary work best in sites 3 and 8; Legal work best in Sites 4 and 7 (Sites 1 and 5 are conspicuous by their absence).
Infiltration teams make most of the profit, Paramilitary make a little, Legal has actively been losing money.
The Safer (or, failing that, more Euclidean) objects are, the more profit they net. Aside from that, it’s pretty difficult to predict how much a given anomaly will be worth, though it being Organic and/or Humanoid seems to help a little.
There’s a weird tension between the last two points. Being extracted Legally seems to actively devalue an anomaly compared to being extracted by other means, in ways I can’t explain by correlations with success rates or other factors. Maybe Lawyers are skimming a lot of the good stuff off the top, while Infiltrators are much more loyal/terrified? Or maybe SCP is secretly really good at resisting legal challenges, and just lets MCD’s lawyers win when-and-only-when it’s over something they don’t mind losing? I can’t help but feel I’m missing something here.
Paramilitary and Infiltration efforts seem to have become less profitable over time (while in contrast, Legal has always sucked)
MCD’s lifetime profits (ignoring extra costs like employee salaries etc, which we know are huge) is a little over 120 billion in today’s money. That’s a lot, but it’s still kind of cute that a century of effort and the use of literal dark magic got them about halfway to Musk’s current net worth, even with insanely generous assumptions.
Final allocations:
Paramilitary units target SCP-3339, SCP-4625, and SCP-5136
Infiltration units target SCP-4390, SCP-2719, and SCP-537
I was debating having the legal teams stay home, but if I point them away from Keter stuff they may actually help the company’s bottom line for a change, so . . . they can target SCP-3850, SCP-3212, and SCP-4957.
Thoughts on the Long Game:
MCD’s legal department are a bunch of incompetent, possibly-corrupt clowns who keep losing their company money. I should probably retain some lawyers of my own and see what they have to say about this NDA I signed ASAP.
Revised model:
Noise is generated per-present, and combined additively for each child. Under ordinary circumstances:
Blum-Bloopers produce 6 Noise.
Fum-Foozlers produce 8 Noise for girls, 4 for boys.
Who-Whonkers produce 5 Noise for girls, 9 for boys.
Sloo-Slonkers produce 5 Noise, plus 1 extra unit for every two years since birth.
Gah-Ginkas produce 5 Noise, plus 1 extra unit for every two years until teenagerhood.
Trum-Troopas usually produce 10 Noise, but occasionally only produce 5. (Figuring out what if anything predicts a halving here is the main unsolved problem in my analysis.)
Each child has exactly one gift type which produces twice as much Noise as it would by default. This remains consistent over a child’s life (or at least until adolescence), so we can use past preferences to predict how they’ll react this year.
Allocations for minimizing Noise:
(I haven’t confirmed this is literally optimal given my model but I bet it’s pretty close)
Andy: F + G
Betty: W + G
Sally: W + T
Phoebe: W + T
Freddie: F + B
Eddie: F + B
Cindy: B + T
Mary: W + S
Ollie: S + B
Johnny: F + S
Allocations for maximizing Noise:
Andy: S + B
Betty: F + B
Sally: F + S
Phoebe: G + B
Freddie: S + T
Eddie: W + T
Cindy: W + T
Mary: G + F
Ollie: F + W
Johnny: B + W
The weird hipstery approach I’m actually going with, after way too much overthinking about the “presents are allocated randomly” rule:
I suspect the convention the Whos think is best,
Is a lesson, a puzzle, a trick and a test.
I doubt that they’re stupid, or that they don’t care:
It’s how they teach children to trade and to share.
As a rule, the traditions your ancestors used,
Should not be ignored. But they can be improved!
The dice may choose poorly. I know that I won’t.
So, each child gets gifts: one they like, one they don’t.
That way, I make sure that no Who is left out,
Of the joy or the dealing this day is about.
Andy: F + B
Betty: S + B
Sally: G + S
Phoebe: F + B
Freddie: W + T
Eddie: S + T
Cindy: G + T
Mary: W + F
Ollie: B + W
Johnny: F + W
And as for myself, well, it’s hard to believe
That I missed this solution: I’ll simply just . . . leave!
Flee south for the winter! Head back when it’s done!
Let them have all their joys, all their noise, all their fun!
(If I so disapprove of how Whos spend this day,
I shouldn’t own property here anyway!)
It helped me realize that, if you place a high premium on avoiding the direct fallout (metaphorical or literal) of this conflict’s worst-plausible-case scenario, it might be a good idea to spend the next few months holidaying and/or remote-working in an uninvolved nation.
Thank you for making this.
Misc. Insights:
An adventuring party has a success chance of ~64%. We need to get three of them to win in a row. This is worrying.
It looks like level has almost no impact on chance of success, but there’s a major confounder in that more expensive teams get sent on longer and more arduous journeys: length of a dungeon correlates very strongly with the total price of an expedition, and dungeons with multiple dragons attract a disproportionate number of >level 6 adventurers.
Success rates for dungeons with ‘Goblin’ in the name are much lower than average, though so is the average level of the party sent. I think this means the current market is pretty good at pricing in general, but systematically underestimates Goblins.
A (very) crude approximation is that the price of an expedition is about 2000gp times the number of encounters to be encountered. Our adventurers have to survive a total of 23 encounters, and we only have 36000gp to play with. This is worrying.
Classes seem about evenly distributed, but there’s a bias towards diversity; there are far fewer teams with two or more of a given class than you’d expect if it were random. However, this bias is if anything not strong enough; success rates for parties with four unique classes are much higher than success rates for parties with three. I don’t know to what extent this is because more variety increases the odds that a party will have the right counter to an obstacle, and to what extent class diversity is Inherently Good.
Adventuring parties tend to have everyone be about the same level; this tendency is so strong that the sampling bias makes it hard to work out whether it’s a good idea. I guess I’ll trust convention here?
Literally all the parties with a gap of >3 between their max and min levels are like that because a high-level Rogue joined a low-level party. I’d suspect that this is Rogues faking being higher-levelled to get more gold, but actually teams like this have an above-average success rate, so I have no idea what’s going on. (Fortunately, I don’t have to, since my strategy makes no use of high-level Rogues.)
In general, Clerics are the most useful class, followed by Mages and Fighters.
The actual backbone of my strategy:
A dungeon is a marathon, not a series of sprints; probability of success in later stages is affected by how well a party handled earlier ones. This is shown by the fact that literally all parties managed to defeat their first encounter, and only <0.1% fall to their second (despite the fact that either of these can be Dragons!). The practical implication is that handling ‘easy’ encounters smoothly probably matters, since it means the party will be fresh for the real threats.
Specific encounters have specific counters. By finding what distinguishes the average party defeated by a thing from the average party that encounters a thing, I can determine what classes best combat which obstacles.
Measure has very cleverly inferred what encounters each dungeon is likely to contain, and I’m not shy about copying their homework. (Thank you, Measure.)
Different encounters have vastly different failure probabilities. Dragons are the most dangerous, and Goblin Chieftains are also pretty bad. Our parties will probably have to fight both. This is worrying.
Decisions:
(I reserve the right to change all of these if I come up with a better idea or another commenter shares a new and relevant insight.)
For the Lost Temple of Lemarchand, I’ll send a level 2 Rogue to handle the needletraps*, a level 2 Druid to handle the snakepits, and a level 2 Cleric and a level 2 Mage to handle the various undead.
For the Infernal Den of Cheliax, I’ll send a level 5 Fighter to fight the orcs and the dragon, a level 3 Druid to keep everyone safe from the snakepits and wolves so they’re fresh for the boss fight, and a level 3 Ranger and level 3 Mage to help the Fighter with the dragon (dragons are scary!).
For the Goblin Warrens of Khaz-Gorond, I’ll send a level 4 Fighter to handle the goblin chieftain and the boulders, a level 4 Ranger to handle the rank-and-file goblins, a level 3 Cleric to help the Ranger out, and . . . I guess a level 3 Fighter to support the first one? (I hate to have doubles on a team but there’s no other class that does as well against chiefs and boulders.)
*This is the one place I feel confident Measure made a mistake: “Rogues help with needletraps” is the most reliable inference I ran into in my encounter-countering research, so I don’t get why they’d include a Mage and a Fighter but not a Rogue in Adventuring Party #1.
However:
The odds don’t seem great. The odds of all three adventures concluding successfully really don’t seem great. And that’s assuming all my inferences are correct, which they aren’t. I know my character is set on this path, but if I was faced with a prospect like this in real life, there’s no way I’d bet anything I’d be afraid to lose.
I suspect a large (possibly not dominant) part of the ice cream effect is required preptime triggering myopic discounting. If eating ice cream at home, you need to take it out of the freezer at least a few minutes before eating it; this means that if your comfort food of choice is ice cream, you’ll only eat it if it seems like a legitimately good idea (‘a moment of weakness’ becomes ‘like 10min of weakness’, a higher bar for cravings to clear).
Thank you for making this.
Regular team:
Nullifying Nightmare, Blaze Boy, Greenery Giant, Tidehollow Tyrant, and . . . yeah, okay, Phoenix Paladin.
(I was on the fence about whether the last spot should go to Paladin or Ranger, but when I saw Measure’s answer I decided to let hipsterism be the deciding factor.)
Key Insights:
There seems to be a rock-paper-scissors thing going on here: Earthy fighters have an advantage over Watery fighters, Watery fighters have an advantage over Flamey fighters, and Flamey fighters—kinda, sorta, unreliably—have an advantage over Earthy fighters. (And the Nightmare has an advantage over everyone.)
This is relevant because 3⁄5 of the opposing team is Earthy fighters, including Greenery Giant, who has strength that rivals the Nightmare, and whose presence on a team predicts a ~60% chance of victory.
Teams which are slanted too heavily towards a given element have an extremely low win rate. I can’t tell to what extent this is because losing the rock-paper-scissors game hurts you more than winning it helps, and to what extent balance is inherently valuable, so I’m playing it safe and not building an entire team of firestarters (also, there are only two Flamey fighters with non-terrible win/loss ratios).
Tangential insights:
I infer from the format of the alternative list that—absent an extremely tricky fakeout—position doesn’t matter: A+B+C+D+E is equivalent to E+D+C+B+A.
Different fighters are used with very different frequencies, but this sampling bias doesn’t seem to affect my analysis much.
Eyeballing the correlation matrix, it looks like teams are thrown together randomly; no pairs that always show up together, etc. This makes things much simpler, since I can be confident that (for example) GG’s apparent power isn’t just because people keep using him alongside NN (or vice versa).
There’s a random element here. Existence proof: A+B+C+S+V vs A+E+I+T+V happened twice with different outcomes. Given this, I’d want to push Cloud Lightning Gaming to have the match be best-of-five, to decrease randomness’ relevance to the outcome.
I appreciate the omission of letters that would let us (accidentally or otherwise) spell out common swearwords.
PVP team:
DM’d
Misc. prelim notes:
There’s a random element. (Existence proof: 16079 and 17759 were the same fight but we only lost 17759.)
There’s an implicit chrono effect: It looks like this war has been developing not necessarily to our advantage. (Luckily it seems like this is probably ‘just’ enemies outnumbering our troops more frequently in later rows, and not anyone actually getting better/worse at their job.)
The number of troops sent scales with the size of the enemy forces, making inference trickier; however, I haven’t seen anything contradicting the hypothesis that loadouts are decided by throwing darts at a board.
Specific weapons counter specific enemies: in particular, the Minigun is usually pretty lousy, but drops Scarabs like flies.
I expected to find synergies between weapons, and didn’t. I did, however, find some antisynergies: Miniguns and Flamethrowers are hella redundant (presumably because they’re both anti-Scarab bugspray), and the [MPR] set all clash with each other (“Why do you need Gun? You already have Gun!”)
Guaranteed victories seem possible. (A single soldier with a minigun can perfectly-reliably survive 5 Scarabs, but not 6.)
Mukami-sama, the God of Atheism
I found this disproportionately charming.
Reflections on my performance:
This stings my pride a little; I console myself with the fact that my “optimize conditional on Space and Life” allocation got a 64.7% success rate.
If I’d allocated more time, I would have tried a wider range of ML algorithms on this dataset, instead of just throwing XGBoost at it. I’m . . . not actually sure if that would have helped; in hindsight, trying the same algorithms on different subsets (“what if I built a model on only the 4-player games?”) and/or doing more by-hand analysis (“is Princeliness like Voidliness, and if so, what does that mean?”) might have provided better results.
Reflections on the challenge:
I found this one hard to get started with because it had a de facto 144 explanatory columns (“does this party include a [Class] of [Aspect]?”) along with its 1.4m rows, and the effects of each column was mediated by the effects of each other column. This made it difficult—and computationally intensive! - to figure out anything about what classpect combinations affect the outcome.
That said, I appreciated this scenario. The premise was fun, the writing was well-executed, and the challenge was fair. Also, it served as a much-needed proof-by-example that “train one ML model, then optimize over inputs” isn’t a perfect skeleton key for solving problems shaped like this. If it was a little obtuse on top of that . . . well, I can chalk that up to realism.
Reflections on my attempt:
My PvE approach, as I mentioned, was to copy the plan that worked best in a comparable game: train a model to predict deck success, feed it the target deck, then optimize the opposing deck for maximum success chance. I feel pretty good about how well this worked. If I’d allocated more time, I would have tried to figure out analytically why the local maxima I found worked (my model noticed Lotus Ramp as well as Sword Aggro but couldn’t optimize it as competently for some reason), and/or try multiple model types to see what they agree on (I used a GBT, which has high performance but doesn’t extrapolate well).
My PvP approach was a lot more scattershot. After some hilariously bad attempts to get an uncounterable deck by having two decks repeatedly optimize against each other, I decided to just recycle my PvE deck and hope I happened to win the rock-paper-scissors game. As it happened, Fate smiled on me, but if there had been any Good Tribal decks in play I wouldn’t be looking quite so clever right now.
Reflections on the challenge:
This was fun. I particularly like that it was superficially similar to Defenders of the Storm, while having profoundly different mechanics: I came in expecting another game that’s mostly about counters, and instead got a game that’s mostly about synergy. And, as everyone (including me) has already said, the premise and writing are hilarious.
My only problem with this game was the extra difficulty associated with approaching it analytically if you don’t happen to know about mtg-style card games (I remember looking at the comments on the main post late last week and wondering what a ‘ramp’ was). However, this issue is mitigated by the facts that:
.It (presumably) gave card game fans a chance to practice balancing-priors-against-new-evidence skills and not just ML/analysis skills.
.It’s not unreasonable for card game knowledge to help pick cards in a game centered on card games.
.I won despite lacking this background.
Reflections on my attempt:
It looks like I was basically right. Even in the place I came up short – figuring out Trum-Troopas – I knew I was probably missing something, since it would have been weird for that to be the only not-perfectly-predictable part of the problem.
Reflections on the challenge:
This is the first D&D.Sci which is a pure puzzle; that is, the first one without randomness in the linkage between explanatory and response variables. I think this would be unfair for something presented as a social science problem, except that a) the Seussian context was a pretty big hint that normal rules don’t apply and implausibly-tidy solutions are on the table, b) it was fairly obvious after an hour or two of looking at the data that there were unusually clean linkages between at least some toy choices at at least some noise levels (G+S equals exactly 16 more than half the time, many toy combos have a suspiciously low number of possible noise outputs), and c) using only ML can still net you a much-better-than-chance result. However, I suspect the anomalous neatness may have caused limited engagement: once someone posts a complete or near-complete solution, why bother investigating for yourself? And why bother writing about your analysis if it’s identical or near-identical to one already posted?
Others may have other opinions, but I really liked the 10-day length: giving players a week plus a choice of weekend provided a lot of breathing room. I also (continue to) think the problem introduction was the best-written one so far. And while I don’t know if the high puzzleishness was a good idea overall, it definitely enabled one heck of a eureka moment when I figured out (most of) the rules. Thank you very much for running this game.
This seems like a natural fit for D&D.Sci games. All the ones I made are public domain, so you can use them freely (and I bet the other people who made some would give you permission if you asked them nicely), they’ve been publicly played by clever humans with a variety of skill levels and associated outcomes, and they’re obscure enough that I doubt an LLM would have memorized the solutions (and if not you could tweak the names and data-generation hyperparameters to flatfoot them).
. . . I happen to have a completed-but-unreleased D&D.Sci game, which I was planning to put on LW early next month, after everyone got back from their holidays. Would it be helpful if I sent it to you and delayed the release until Feb, so you and yours could let LLMs try it first?
My main finding thus far:
There’s a single standard archetype which explains all the most successful teams. It goes like this: [someone powerful from the MPR cluster, ideally P], [a frontman, selected from GLS], [someone long-ranged, selected from CHJ]. In other words, this one is all about getting a good range of effective ranges in your team.
My tentative PVE submission is therefore:
Hurler, Legionary, Professor
However:
I’m pretty sure there’s some second-order rock-paper-scissors stuff going on that I’m not accounting for: Rangers seem better than Professors at beating Samurai in particular, Marauders seem to have a similar speciality when fighting Tyrants, Duelists/Bandits beat Amazons/Wizards beat Legionaries/Golems beat Duelists/Bandits
I haven’t looked into how a bunch of strong/sturdy/snipey trios behave facing off against each other, which is relevant both because the PVE enemy is that kind of trio and because the PVP arena will probably be full of them.
Based on my research so far, I can’t rule out that there’s some secondary archetype which sucks in general but acts as a magic bullet against strong/sturdy/snipey trios in particular.
I have a stupidly ambitious ML thing I want to use this challenge as an excuse to (try to) do.
So it’ll take me a while to decide on my PVP allocation, and I’m reserving the right to change my PVE one.
My provisional answer is:
Fireball, Levee, Hammer
This is supported by the reasoning that:
Levee (Fire/Earth) does a passably mediocre job protecting against Missiles (Earth/Water) and Fireball (Air/Fire); Fireball (Air/Fire) and Hammer (Light/Air) can both sneak past Solar (Fire/Light) by sharing an element.
And more prosaically by the fact that:
When I filtered the dataset to have Wizard A with the opponent’s spell list, the spells which most raised Wizard B’s winrate were those three.
However:
I’ve had a hard time figuring out how to weight “counter the opponent’s element choices!” vs “go with what has the most ambient mana!” vs “go with what blocks the opponent’s highest-mana attacks!”. It’s entirely possible that I should replace Hammer with Missiles, Rays or Vambrace; I hope to look into these possibilities later on.
Additionally:
The opponent picked a pretty good set of spells for the conditions in play; as such, I’m seriously questioning whether I can get my master even a >66% winrate.
I like the continued (slight) ambiguity about whether/to what extent Fay is cursed vs a figment vs just unpopular.
At the risk of being accused of flagrant self-promotion, I also have a few bad examples that don’t strike me as entirely wrong. My data science challenges are only tractable to players with the appropriate skillset, and resemble real-life problems the same way mystery novels resemble real-life detective work . . . but if you’re looking for novel ways to test for skill at Inferring The Truth And Then Using It, they’re probably relevant to your interests.