I took this game as an opportunity to demonstrate how my modelling library works. This was in its own way a resounding success: I can’t think of a better demonstration of my methodology’s strengths and weaknesses than getting a uniquely deep & justifiable view of the underlying systems—I really wasn’t expecting to be right about everything in this post (minus arguably the last part) - and then significantly underperforming more mainstream approaches. Still, I think I did acceptably, and a combination of decent judgement and excellent luck happened to let my character get a Perfect Feast, so I’m content.
(I really shouldn’t have skipped the “. . . and then use all the insight you got from playing with interpretable models to make best use of uninterpretable treebased models” step[1] at the end. I’ll know better next time.)
Reflections on the challenge
I know I liked this one, but I’m uncertain as to how much: there were two major aspects I find myself deeply ambivalent about.
First: the relatively low number of rows, high number of columns and nonzero randomness in output made this look like a data starvation problem, but there turned out to be a pretty tight linkage between predictors and responses, such that it actually was more-or-less fair; in other words, the scenario pretended to be harder and jankier than it was. The effect of this for me was kind of videogamey: I’m not sure whether to consider this impeccable design (“good job mechanically playing into the scrappy-underdog-who-keeps-winning-anyway fantasy!”), lowkey disquieting (“should a game about epistemology be doing that, though?”), or a legitimate extra layer of challenge (“I need to git gud at knowing how gud I need to git.”[2]).
Second: the challenge was in retrospect pretty cheeseable, but none of the actual players actually cheesed it. I can net a low-variance high-Quality Feast just by ordering historical Feasts by Quality and mimicking one of the ones that managed to get Quality=20[3]; that said, afaict the only person to make use of an approach like this was me, and I still didn’t lean on it anywhere as hard as I could have[4]. (This could have been fixed with a trivial extension of the existing ruleset by having one or more foods with high and high-variance Sweetness and/or Spiciness, such that they could sometimes fill one or more of the quotas by themselves; these would be disproportionately represented in the highest rows, but in expectation a terrible choice for players.)
There were also plenty of things I just straightforwardly enjoyed. The writing and the premise were both fun, and underlying mechanics were conceptually beautiful and impeccably implemented. Also, I like a game which lets me show off, and a game which kicks my ass; this one did the former qualitatively and the latter quantitatively, which imo counts as two reasons to think it’s good. All told, I’d award this a conflicted-but-approving [almost-certainly-at-least-three-and-plausibly-more-than-that-but-I-don’t-know-how-much]/5 for Quality.
Two of the four dish combinations with Quality=20 had >16 Quality in expectation.
This is true, but is actually something I looked into making the scenario. The average score of ‘pick a random 20-quality feast from the dataset’ was 15.38, which players did successfully beat.
There was a related writing-constraint that came from me pushing to simplify the ruleset a bit. The original envisioned ruleset was going to give players an additional rule limiting which dishes they were allowed to include.[1]
This would have let me give a substantially larger dataset without worrying that grabbing the best-scoring thing out of the dataset would trivially solve the scenario—if 6-7 dishes were banned, it would be easy for none of the top-scoring feast to be allowed for you, and/or for the best-scoring feast you were allowed to be a single bit of random luck that would betray you if you repeated it.
When I removed that rule, I needed to cut down on the dataset size to avoid that being a trivial solution, which is what led to the data-starvation. Overall I think something like that wouldn’t have been worth the complexity—just telling players they can include whatever dishes they want is simpler and also feels more realistic in context. Open to other views on that, though.
I had a whole bunch of excuses lined up for this too! One of your companions gets seasick and doesn’t want to go hunt Kraken...another is Good-aligned and will be angry if you kill a Pegasus...
Fewer rows might not give interpretable/rules-based solutions an advantage. I tried training on only the first 100 or 20 rows, and I got CDEFMW (15.66) and EMOPSV (15.34) as the predicted best meals. Admittedly CDEFMW shows up in the first 100 rows scoring 18 points, but not EMOPSV. Maybe a human with 20 rows could do better by coming up with a lot of hypothetical rules, but it seems tough to beat the black box.
Reflections on my performance
I took this game as an opportunity to demonstrate how my modelling library works. This was in its own way a resounding success: I can’t think of a better demonstration of my methodology’s strengths and weaknesses than getting a uniquely deep & justifiable view of the underlying systems—I really wasn’t expecting to be right about everything in this post (minus arguably the last part) - and then significantly underperforming more mainstream approaches. Still, I think I did acceptably, and a combination of decent judgement and excellent luck happened to let my character get a Perfect Feast, so I’m content.
(I really shouldn’t have skipped the “. . . and then use all the insight you got from playing with interpretable models to make best use of uninterpretable treebased models” step[1] at the end. I’ll know better next time.)
Reflections on the challenge
I know I liked this one, but I’m uncertain as to how much: there were two major aspects I find myself deeply ambivalent about.
First: the relatively low number of rows, high number of columns and nonzero randomness in output made this look like a data starvation problem, but there turned out to be a pretty tight linkage between predictors and responses, such that it actually was more-or-less fair; in other words, the scenario pretended to be harder and jankier than it was. The effect of this for me was kind of videogamey: I’m not sure whether to consider this impeccable design (“good job mechanically playing into the scrappy-underdog-who-keeps-winning-anyway fantasy!”), lowkey disquieting (“should a game about epistemology be doing that, though?”), or a legitimate extra layer of challenge (“I need to git gud at knowing how gud I need to git.”[2]).
Second: the challenge was in retrospect pretty cheeseable, but none of the actual players actually cheesed it. I can net a low-variance high-Quality Feast just by ordering historical Feasts by Quality and mimicking one of the ones that managed to get Quality=20[3]; that said, afaict the only person to make use of an approach like this was me, and I still didn’t lean on it anywhere as hard as I could have[4]. (This could have been fixed with a trivial extension of the existing ruleset by having one or more foods with high and high-variance Sweetness and/or Spiciness, such that they could sometimes fill one or more of the quotas by themselves; these would be disproportionately represented in the highest rows, but in expectation a terrible choice for players.)
There were also plenty of things I just straightforwardly enjoyed. The writing and the premise were both fun, and underlying mechanics were conceptually beautiful and impeccably implemented. Also, I like a game which lets me show off, and a game which kicks my ass; this one did the former qualitatively and the latter quantitatively, which imo counts as two reasons to think it’s good. All told, I’d award this a conflicted-but-approving [almost-certainly-at-least-three-and-plausibly-more-than-that-but-I-don’t-know-how-much]/5 for Quality.
Or, potentially, the “construct an entire modelling paradigm around the shape of the problem” step.
I’d have been much less likely to skip that final step if I hadn’t thought I had too little data to justify higher model complexity.
Two of the four dish combinations with Quality=20 had >16 Quality in expectation.
If a tree falls in the forest, and the only person to hear it is over a mile away, is the sound it makes loud?
This is true, but is actually something I looked into making the scenario. The average score of ‘pick a random 20-quality feast from the dataset’ was 15.38, which players did successfully beat.
There was a related writing-constraint that came from me pushing to simplify the ruleset a bit. The original envisioned ruleset was going to give players an additional rule limiting which dishes they were allowed to include.[1]
This would have let me give a substantially larger dataset without worrying that grabbing the best-scoring thing out of the dataset would trivially solve the scenario—if 6-7 dishes were banned, it would be easy for none of the top-scoring feast to be allowed for you, and/or for the best-scoring feast you were allowed to be a single bit of random luck that would betray you if you repeated it.
When I removed that rule, I needed to cut down on the dataset size to avoid that being a trivial solution, which is what led to the data-starvation. Overall I think something like that wouldn’t have been worth the complexity—just telling players they can include whatever dishes they want is simpler and also feels more realistic in context. Open to other views on that, though.
I had a whole bunch of excuses lined up for this too! One of your companions gets seasick and doesn’t want to go hunt Kraken...another is Good-aligned and will be angry if you kill a Pegasus...
Fewer rows might not give interpretable/rules-based solutions an advantage. I tried training on only the first 100 or 20 rows, and I got
CDEFMW (15.66)andEMOPSV (15.34)as the predicted best meals. AdmittedlyCDEFMWshows up in the first 100 rows scoring 18 points, but notEMOPSV. Maybe a human with 20 rows could do better by coming up with a lot of hypothetical rules, but it seems tough to beat the black box.