D&D.Sci 5E: Return of the League of Defenders
This is an entry in the ‘Dungeons & Data Science’ series, a set of puzzles where players are given a dataset to analyze and an objective to pursue using information from that dataset.
Note: this is a sequel to the original ‘League of Defenders of the Storm’ scenario, but uses a different ruleset. If you want to play that one first you can, but it’s not necessary to have played that one in order to play this.
You’ve been feeling pretty good about your past successes as an esports advisor on Cloud Liquid Gaming, using your Data Science skills to help them optimize their strategies against rival teams. But recently, you’ve gotten a very attractive offer from a North American team.
The one constant in the esports scene is that the US and European teams invariably lose to Korean and Chinese ones. In the recent Mongolian Summer Invitational, no Western team ever beat an Asian one.
The attempts of Western teams to hire away top Asian players have not helped. Recently, however, the ‘Silver Shielders’ team had the bright idea of hiring away the support staff rather than the players themselves. And that’s where you come in.
The sequel to the critically acclaimed ‘League of Defenders of the Storm’ is being released soon. While the full ruleset isn’t available, your new employer does have a dataset of results of beta plays of the game, and is hoping for advice. In the upcoming release-day tournament, they’re going to send a team of three of their best players to play against a foreign team—and they’re hoping that with your help this might not be North America’s 185th consecutive loss on the international stage.
If you can employ your Data Science skills again, perhaps you can reverse the decline of NA as a region. (And get paid a lot. That’s good too.)
DATA & OBJECTIVES
You need to select a team for your employers to play. You should choose three of the following 15 characters:
Your goal is to maximize your team’s winrate against an opposing team, which your employers believe will be playing the following three characters:
(Note that you are allowed to select characters your opponents are also playing, but you are not allowed to select the same character more than once).
To help you with this, you have a dataset of plays of the game. Each entry is a game that was played, the three characters on each team, and which team won.
BONUS PVP OBJECTIVE
You may also submit a PVP team. I recommend sending it as a PM to me, but if you don’t mind other people seeing it you can just put it in your answer. The PVP team with the best overall record (sum of performances against all other submitted teams) will in theory win the right to specify the theme of an upcoming D&D.Sci scenario. In practice, I still owe the last winner abstractapplic their scenario first, and while I expect to finish that soon I’ve thought that for at least the past nine months, so maybe don’t count on it too much.
I don’t want the existence of a PVP objective to incentivize people too strongly against posting findings in the chat, so as an effort to reduce the risk of your findings being used against you: if multiple people submit the same PVP team, I will break the tie in favor of whoever submits it earlier.
I’ll aim to post the ruleset and results on June 5th (giving one week and both weekends for players). If you find yourself wanting extra time, comment below and I can push this deadline back.
As usual, working together is allowed, but for the sake of anyone who wants to work alone, please spoiler parts of your answers that contain information or questions about the dataset. To spoiler answers on a PC, type a ‘>’ followed by a ‘!’ at the start of a line to open a spoiler block—to spoiler answers on a mobile, type a ‘:::spoiler’ at the start of a line and then a ‘:::’ at the end to spoiler the line.
My main finding thus far:
There’s a single standard archetype which explains all the most successful teams. It goes like this: [someone powerful from the MPR cluster, ideally P], [a frontman, selected from GLS], [someone long-ranged, selected from CHJ]. In other words, this one is all about getting a good range of effective ranges in your team.
My tentative PVE submission is therefore:
Hurler, Legionary, Professor
I’m pretty sure there’s some second-order rock-paper-scissors stuff going on that I’m not accounting for: Rangers seem better than Professors at beating Samurai in particular, Marauders seem to have a similar speciality when fighting Tyrants, Duelists/Bandits beat Amazons/Wizards beat Legionaries/Golems beat Duelists/Bandits
I haven’t looked into how a bunch of strong/sturdy/snipey trios behave facing off against each other, which is relevant both because the PVE enemy is that kind of trio and because the PVP arena will probably be full of them.
Based on my research so far, I can’t rule out that there’s some secondary archetype which sucks in general but acts as a magic bullet against strong/sturdy/snipey trios in particular.
I have a stupidly ambitious ML thing I want to use this challenge as an excuse to (try to) do.
So it’ll take me a while to decide on my PVP allocation, and I’m reserving the right to change my PVE one.
Threw XGBoost at the problem and asked it about every possible matchup with FRS; it seems to think
my non-ML-based pick is either optimal or close-to-optimal for countering that lineup.
(I’m still wary of using ML on a problem instead of thinking things through, but if it confirms the answer I got by thinking things through, that’s pretty reassuring.)
Therefore, I’ve decided
to keep HLP as my PVE team.
And I’ve DM’d aphyer my PVP selection.
Just recording for posterity that yes, I have noticed that
Rangers are unusually good at handling Samurai, so it might make sense to have one on my PVE team.
However, I’ve also noticed that
Rangers are unusually BAD at handling Felons, to a similar or greater degree.
I think it makes more sense to keep Pyro Professor as my mid-range heavy-hitter in PVE.
(. . . to my surprise, this seems to be the only bit of hero-specific rock-paper-scissors that’s relevant to the PVE challenge. I suspect I’m missing something here.)
(Mostly pretty brainless exploration here. I have not particularly tried to work out the actual game rules.)
It looks as if
CFHJMPR all do pretty well against individual other cards (just looking at fraction of cases with, say, C on one side versus, say, G on the other where the side with C wins) but the triples that perform best mostly seem to be two of these plus L or S. Of course there may be weird selection effects that make this sort of statistical estimation misleading.
On the other hand, DLSBGW seem individually to do pretty poorly.
These kinda line up with some naive classification by weapon-type: thrown weapons (chakram, hammer, javelin) seem to do well; fire/firearms (flamethrower, matchlock, pyro) do too; implicit or explicit bladed weapons (duelist, lamella, samurai) don’t do so well, and nor do blunt-force weapons (bludgeon, golem). It seems like the wizard kinda belongs in that last set. This is all very handwavy. I vaguely imagine a ruleset where ranged attacks happen first and then short-range ones, or something, but as mentioned above I haven’t so far particularly tried to work out the rules.
I built some brute-force models using a gradient-boosting tree classifier from scikit-learn in Python. (I also tried some other things that performed worse.) I found that I got a small improvement in fit by including not just the presence/absence of each card from each team but also the count of cards from each of the five sets of three implied by the previous paragraph.
Asking the models for the best-performing triples against FRS suggested several with a >= 80% win-rate (though I wouldn’t trust the actual numbers much). Different triples in different randomized runs, but FGP is almost always near the top, and brute-force counting shows FGP doing well against all pairs of cards from FRS and getting 2 wins out of 2 against FRS itself.
I also looked for pairs of cards that seem to show notable interactions (e.g., having both in your hand does notably better or worse than you’d predict from the stats from having each one in your hand separately) and apparently found some, but adding these pairs as features before doing model-fitting apparently made the models worse so I haven’t used them.
My currently-proposed team to play against FRS is
I haven’t looked at PVP at all so far.
Avid readers of the D&D.Sci series will remember that in the previous “League of …” episode I initially pessimized instead of optimizing. I did that here too, but I think I’ve fixed that before posting anything here. (But I haven’t re-checked all the heuristic handwaving bits and there may be debris from my screwup in there.)
At the time of writing this I have not looked at anyone else’s comments.
my findings so far:
I confirm abstractapplic’s finding of three groups. However I have also classified ATW into the MPR group, BD into the GLS group, and F into the CHJ group.
I’ve mostly looked at the dataset restricted to teams (on both sides) that have one character from each group. These teams generally do better than teams with other arrangements, but I could be missing some more narrow counter using a different arrangement.
With this restriction, most winrate variation seems to me to be related to the strength of individual characters, though I could be missing more complicated interactions since I’ve mostly been looking at two-character interactions only. I do note that Lamellar Legionary (already the highest winrate melee) seems to counter Flamethrower Felon which is on the other team.
I also note that not all characters are equally common but this doesn’t seem to be skewing the results all that much (at least in the restricted set of games).
Conveniently, CLP is the highest winrate team with the restriction of games to ones with both teams having one from each group, and the L should counter the enemy F, so I’ll go with that for PVE, though it seems a somewhat bland answer. Edit: oops C seems to be countered by enemy S, I’ll switch to J instead (which also does poorly against S but not unexpectedly so given raw winrates), as JLP is the second highest winrate team in the restricted set of games. I’ll keep my PVP pick the same for now. Flamethrower Felon would counter S but does not have as high a winrate full team combo with L (FLM being the highest such in twelfth place).
abstractapplic’s PVE team is the eighth highest winrate with this restriction, and could well be a superior pick if exploiting some interaction that I didn’t notice.
Thus my PVE pick (for now):
Jaunty Javelineer, Lamellar Legionary, Professor Pyro
Further edit: I looked at who beats FSR and it looks like it actually does fairly well against one from each group in general. The best comp type against it seems to be 2 melee + one from the CFHJ group, second is 1 melee plus two long range, third is two melee, one long range. In particular, Bludgeon Bandit+Daring Duelist + one from CFHJ have never lost to FSR (out of, like, 8 examples, so I’m really risking randomness here) despite both B and D being “bad” picks usually. Thus, I’ve gone mad and switching to:
Jaunty Javelineer, Bludgeon Bandit, Daring Duelist
retaining for PVP:
Captain Chakram, Lamellar Legionary, Professor Pyro
For PVP—for now I’m just going to use my (pre-edit) PVE pick as my tentative PVP pick (retained above) and challenge others to counter it. But I may later swap out to a secret pick with more analysis. If I do, I’ll cross out my PVP pick declaration in this comment.
I checked out what happens if you remove games that include any “trash picks” (A,B,D,T,W), in addition to requiring teams to include one character from each group. This further reduces the dataset significantly, but I noticed that in this set of games, the opposing team FSR has the highest winrate, which suggests it is a very strong team against other conventionally strong teams, even if it doesn’t exploit weaker teams that well.
In this further reduced set, the second highest winrate is JLM, then CLP, then JLP.
Given the low amount of data points, however, these winrate variations between the top teams in the further restricted set could easily be random, so I don’t think there’s all that strong a case to change my picks, and my choices above are unchanged for now. However, this does suggest JLM as an alternate candidate against FSR, and the opposing team FSR itself as a possible PVP pick (if people don’t just submit their PVE picks, or you think people will fail to counter it).
oh wait. For the top teams, the wins are higher if you include trash picks, but the losses often aren’t. This means that these teams are basically always winning against trash picks, and the apparent higher number of data points is effectively an illusion, and the trash-pick-including win rates are distorted by how often teams were matched against bad teams.
examples (strong = has one character from each group, no trash picks, weak = has one character from each group, but at least one trash pick)
team | wins against strong | losses against strong | wins against weak | losses against weak
CLP | 24 | 14 | 118 | 0
JLP | 20 | 12 | 92 | 0
CSP | 23 | 17 | 102 | 0
but on the other hand:
HLP | 21 | 19 | 96 | 3
JLM | 28 | 15 | 100 | 7
FSR | 26 |12 |99 |10
I don’t know to what extent failing to defeat all the weak teams should be taken as evidence that a team isn’t good in general (so that the good numbers against strong teams are more likely to be a fluke).
Takeaways: my data is really thin even in the larger restricted set and I should pay little attention to these winrate variations between full teams; I should try to find more general patterns. I should also maybe look at what particular “trash” picks can beat FSR, in case it is losing reliably to some narrow counter as opposed to just not reliably beating weaker teams in general.
Update in view of the answer likely being soon to be posted:
I got sidetracked among other (non-D&DSci) things by trying to semi-automatically categorize the team compositions in the games with only the restricted team compositions (one character from each group, no trash picks) into similarity clusters. This was tricky because there is a lot of noise in this much smaller dataset, and I didn’t take into account games outside this restricted set at all.
Ultimately, I did get three clusters which seemed to have a rock-paper-scissors interaction. One cluster is Felon-heavy (indeed seems to maybe have all Felon teams) and FLR seems to be a fairly archetypal example. Another cluster is Samurai-heavy and Golem-light; HSM seems to be a fairly archetypal example. The third cluster is Pyro-heavy and JGP seems to be a fairly archetypal example.
Anyway, the FLR cluster tends to beat the HSM cluster which tends to beat the JGP cluster which tends to beat the FLR cluster.
The PVE opposing team, FSR, mostly seems to be in the FLR cluster but is not very central, leaning a bit to the HSM cluster. It hasn’t faced the JGP cluster a lot (maybe 5-6 games depending on cluster definition) and has won maybe 3 or 4 of those, atypical for an FLR cluster member, but that could easily be random due to the low number of games.
Notably, my current PVP pick, CLP, seems to be in the JGP cluster and, as is typical for members of this cluster, tends to lose to members of the HSM cluster. In the absence of reasons to believe that other players have picked teams from the HSM cluster (hmm, but yonge picked HMP (which isn’t in this restricted dataset since it has two characters from the same group) - would that behave like HSM??) I don’t see a compelling reason to switch, though I might change my mind if I post this comment and then the answer isn’t posted for a long time.
Anyway, I’m not sure whether the rock-paper-scissors effect seen in the clustering derives from some collective interaction or is just a result of character pair interactions. Some apparent counters in this restricted dataset:
I’ve now gone and looked at what FSR wins against and adjusted my PVE pick accordingly. I’ll likely adjust my PVP pick as well if I end up having time to check what sort of things candidate PVP picks (and other players’ PVP picks where posted) do well against.
edit: looks like this comment was after aphyer posted the answer, but I checked for any new posts after my PVE edit above and didn’t see aphyer’s post of the answer.
Sorry, wasn’t expecting anything today! I’ll update the wrapup doc to reflect your PVE answer: sadly, even if you had an updated PVP answer, I won’t let you change that now :P
I found this problem late. Could I have an extra day or two please?
Sure, no objections. In the absence of further requests I’ll aim to post the wrapup doc Friday the 9th: I’m fairly busy midweek and might not get around to posting things then.
My best estimate is:
This is also my entry for the PvP contest