This stings my pride a little; I console myself with the fact that my “optimize conditional on Space and Life” allocation got a 64.7% success rate.
If I’d allocated more time, I would have tried a wider range of ML algorithms on this dataset, instead of just throwing XGBoost at it. I’m . . . not actually sure if that would have helped; in hindsight, trying the same algorithms on different subsets (“what if I built a model on only the 4-player games?”) and/or doing more by-hand analysis (“is Princeliness like Voidliness, and if so, what does that mean?”) might have provided better results.
Reflections on the challenge:
I found this one hard to get started with because it had a de facto 144 explanatory columns (“does this party include a [Class] of [Aspect]?”) along with its 1.4m rows, and the effects of each column was mediated by the effects of each other column. This made it difficult—and computationally intensive! - to figure out anything about what classpect combinations affect the outcome.
That said, I appreciated this scenario. The premise was fun, the writing was well-executed, and the challenge was fair. Also, it served as a much-needed proof-by-example that “train one ML model, then optimize over inputs” isn’t a perfect skeleton key for solving problems shaped like this. If it was a little obtuse on top of that . . . well, I can chalk that up to realism.
Good to know, thank you! I think my main takeaway is that I am really bad at judging difficulty levels on these: I actually expected this scenario to be easier than the previous Dwarves & D.Sci scenario, but that one had three different near-perfect solutions while this one only had one noticeably-better-than-random solution.
Long-winded and empirically incorrect argument that led me to that expectation follows:
I was aware of the large number of possible characters—this is why the dataset ended up being so big, because I wanted to be sure it was large enough to allow simple analyses to work in spite of that. One sample approach I tried out on my end as part of designing the scenario was this:
Take only teams that contained a Knight of Blood and a Mage of Time (but of any size).
For each possible classpect, find its winrate on those teams.
This would have given you ~4k teams, with ~120 with each possible other classpect, which wasn’t enough to get an optimal solution but would have been an excellent first step:
Page of Heart has a 59.46% winrate
Maid of Heart has a 57.01% winrate
Maid of Breath has a 51.55% winrate
...
...
Heir of Hope has a 27.10% winrate
Heir of Rage has a 26.85% winrate
Maid of Void has a 22.64% winrate
As I envisioned things playing out:
Just running this approach and grabbing the two highest characters you could:
You would have picked a Page of Heart (3-9-3) and a Maid of Breath (2-12-1)
This would have given you stats of 18-25-17, for a lowest stat of 17 and a 64% winrate.
This isn’t optimal (it over-invests in Friendship, since you’ve picked two different high-Friendship characters), but it’s noticeably better than random.
Additionally, looking at the high/low scores might point you further in useful directions:
For instance, Heart/Breath/Life showed up an awful lot in the top on a variety of different classes.
This might have pointed you in the direction of ‘there’s a specific thing I’m missing’ and gotten you to bring only one Heart-like hero.
Sadly it seems I overestimated how obvious a thing to try that was. Based on the answers it looks like:
simon did something fairly similar to this, requiring 4-person teams but only requiring one of your two starting characters on the team, and ended up with a similar outcome of ‘generally good, but overinvested a bit in Friendship’.
Yonge ran some analysis that did a good job of finding ‘generally strong characters’ but wasn’t specific to the two characters you started with.
You did some kind of ML thing I didn’t understand.
A) I didn’t have access to the processing power I’d need to make it work well on a dataset of this size.
B) I was still thinking in terms of “what party archetype predicts success”, when “what party archetype predicts failure” would have been more enlightening. Or in other words . . .
Reflections on my performance:
This stings my pride a little; I console myself with the fact that my “optimize conditional on Space and Life” allocation got a 64.7% success rate.
If I’d allocated more time, I would have tried a wider range of ML algorithms on this dataset, instead of just throwing XGBoost at it. I’m . . . not actually sure if that would have helped; in hindsight, trying the same algorithms on different subsets (“what if I built a model on only the 4-player games?”) and/or doing more by-hand analysis (“is Princeliness like Voidliness, and if so, what does that mean?”) might have provided better results.
Reflections on the challenge:
I found this one hard to get started with because it had a de facto 144 explanatory columns (“does this party include a [Class] of [Aspect]?”) along with its 1.4m rows, and the effects of each column was mediated by the effects of each other column. This made it difficult—and computationally intensive! - to figure out anything about what classpect combinations affect the outcome.
That said, I appreciated this scenario. The premise was fun, the writing was well-executed, and the challenge was fair. Also, it served as a much-needed proof-by-example that “train one ML model, then optimize over inputs” isn’t a perfect skeleton key for solving problems shaped like this. If it was a little obtuse on top of that . . . well, I can chalk that up to realism.
Good to know, thank you! I think my main takeaway is that I am really bad at judging difficulty levels on these: I actually expected this scenario to be easier than the previous Dwarves & D.Sci scenario, but that one had three different near-perfect solutions while this one only had one noticeably-better-than-random solution.
Long-winded and empirically incorrect argument that led me to that expectation follows:
I was aware of the large number of possible characters—this is why the dataset ended up being so big, because I wanted to be sure it was large enough to allow simple analyses to work in spite of that. One sample approach I tried out on my end as part of designing the scenario was this:
Take only teams that contained a Knight of Blood and a Mage of Time (but of any size).
For each possible classpect, find its winrate on those teams.
This would have given you ~4k teams, with ~120 with each possible other classpect, which wasn’t enough to get an optimal solution but would have been an excellent first step:
Page of Heart has a 59.46% winrate
Maid of Heart has a 57.01% winrate
Maid of Breath has a 51.55% winrate
...
...
Heir of Hope has a 27.10% winrate
Heir of Rage has a 26.85% winrate
Maid of Void has a 22.64% winrate
As I envisioned things playing out:
Just running this approach and grabbing the two highest characters you could:
You would have picked a Page of Heart (3-9-3) and a Maid of Breath (2-12-1)
This would have given you stats of 18-25-17, for a lowest stat of 17 and a 64% winrate.
This isn’t optimal (it over-invests in Friendship, since you’ve picked two different high-Friendship characters), but it’s noticeably better than random.
Additionally, looking at the high/low scores might point you further in useful directions:
For instance, Heart/Breath/Life showed up an awful lot in the top on a variety of different classes.
This might have pointed you in the direction of ‘there’s a specific thing I’m missing’ and gotten you to bring only one Heart-like hero.
Sadly it seems I overestimated how obvious a thing to try that was. Based on the answers it looks like:
simon did something fairly similar to this, requiring 4-person teams but only requiring one of your two starting characters on the team, and ended up with a similar outcome of ‘generally good, but overinvested a bit in Friendship’.
Yonge ran some analysis that did a good job of finding ‘generally strong characters’ but wasn’t specific to the two characters you started with.
You did some kind of ML thing I didn’t understand.
Reflections x3 combo:
Just realized this could have been a perfect opportunity to show off that modelling library I built, except:
A) I didn’t have access to the processing power I’d need to make it work well on a dataset of this size.
B) I was still thinking in terms of “what party archetype predicts success”, when “what party archetype predicts failure” would have been more enlightening. Or in other words . . .
. . . I forgot to flip the problem turn-ways.