Looking like I’ll not have figured this out before the time limit despite the extra time, what I have so far:
I’m modeling this as follows, but haven’t fully worked out and am getting complications/hard to explain dungeons that suggest that it might not be exactly correct
the adventurers go through the dungeons using rightwards and downwards moves only, thus going through 5 rooms in total.
at each room they choose the next room based on a preference order (which I am assuming is deterministic, but possibly dependent on, e.g. what the current room is)
the score is dependent only on the rooms they pass through (but again, am getting complications)
I’m assuming a simple addition of scores to start with, but then adding epicycles (which so far have been based on the previous room, generally)
there is some randomness in the individual score contributions from each encounter.
For the dungeon generation: dungeon generation seems to treat rooms 1-8 equally (room 9 is different and tends to have harder encounters). Encounters of the same types (and some related “themes”) tend to be correlated. Scores in each tournament seem to be whole numbers from each judge and averaged between 3 or 4 judges; I am not sure if any tournaments are judged by 2 or 1, but if so they’re relatively less common.
In theory, I’d like to plug in a preference model and a score model to a simulator and iterate to refine, but I’m not there yet, still working out plausible scores and preferences.
One possibility for the scores and preference order:
baseline average scores:
Nothing: 0; Goblins: 1.5 (1d2?); Whirling Blade Trap 3; Orcs 3; Hag 4; Boulder Trap 4.5; Clay Golem 6, Dragon 6?, Steel Golem 7.5 (edit: <--- numbers estimated with small, atypical samples (included many Nothing, which is problematic for reasons that become obvious with below edit))
With Goblins and Orcs being increased (doubled?) if following goblins/orcs/any trap? (edit—or golems?) (edit—looking now like it’s probably anything but an empty room?)
Plus with the adventurers seemingly avoiding Orcs and Hags more than their difficulty warrants? (I found them to be relatively late in the preference order, then found that they were in practice lower in score, so am having to ad hoc adjust if I keep the assumption that the score contribution and prefrence order are related. 1.5 multiplier? 2x multiplier? fixed addition?) (I’m assuming a 1.5x multiplier atm since I initially had Hag avoided over anything but orcs, but found one dungeon that looks suspiciously like, but does not prove, Hag being chosen over Dragon (edit: see below for update)) (I suppose +2 would also work) (edit—it looks like the Orc difficulty increase for following a non-empty room only applies to adventurer preference if the current room is also Orcs—violating the assumption that preference is tied to expected difficulty. But for Goblins it seems the preference may indeed depend only on following a non-empty room, though in practice it doesn’t matter much since it only affects order wrt WBT).
(edit—see update to preference order below)
Assuming the above is correct, and I’m pretty sure it isn’t but hopefully has some relationship with reality, one strategy might be:
CHN/WON/BOD <---obsolete answer
where the idea is to use the encounters the adventurers avoid too much relative to their actual score contributions (Hag, Orcs) to herd the adventurers away from the Nothing rooms. One of the Orcs is left in after a Boulder Trap in the belief that will make it score higher than the hag. WBT is left in the preferred path to lead the adventurers along, don’t immediately see a way to avoid this.
EV if above model is correct: 6+3+4.5+6+6=25.5
How I’ve gotten here (mainly used Claude and Claude-written code, including the analysis tool which is good for prototyping if you don’t mind javascript):
found initial basic encounter score contribution estimates from linear regression on whole dungeon
after determining that rooms 1-8 were interchangeable as far as dungeon generation is concerned, looked at room importance to score, guessed the basic model based on that iirc (might have been more complicated than this) (I do remember considering and rejecting a model where each room is selected one at a time from the full set of available rooms, and rejecting any “symmetrical” model based on working out the full path in advance)
initially assumed that adventurers preferred easier encounters based on the inital score estimates
refined preference order based on minimizing variance between same-predicted-sequence-of-encounters dungeons
tried to work out how scores actually work by filtering for specific predicted sequences of encounters and finding their scores
found epicycles from that and started refining model, including preference order adjustments
haven’t really finished the above step, epicycles might be because model is wrong/incomplete?
hypothetical todo: apply model to entire dataset, also develop model for variations in score from each encounter, compare to known 3-judge and 4-judge tournaments for full Bayes assessment, refine further with this as feedback
edit: I’ve now read other people’s comments; I did not notice any 1-point jump in scores (didn’t check for it), not sure if i would have noticed if it is a judging difference as opposed to a strategy change? (wouldn’t notice if just strategy change). Also I did not notice anything special about Steel Golems at the entrance vs. other spots, did not check for any change in distribution of 3 vs 4 judge tournaments, etc.
further analysis after the above:
I’ve looked at root mean square deviation of predictions from the data for the full dataset (full Bayes seems a bit intimidating to code atm even with AI help). From this it seems the preference order is (there remains a likely possibility for more complications I haven’t checked): Nothing > Goblins (current encounter null or Nothing) > Goblins (otherwise) = Whirling Blade Trap > Boulder Trap = Clay Golem = Orcs (current encounter not Orcs) > Dragon > Steel Golem >= Orcs (current encounter Orcs) > Hag Nothing > Goblins (current encounter null or Nothing) > Goblins (otherwise) = Whirling Blade Trap > Boulder Trap > Clay Golem = Orcs (current encounter not Orcs) > Dragon > Orcs (current encounter Orcs) > Hag = Steel Golem
where I can’t distinguish between Steel Golem being preferred or equal to Orcs with current encounter being Orcs.
Soo, if Orcs are avoided equally to a Boulder Trap if the current encounter is not Orcs, I need to improve the herding. But also it seems Orcs get doubled by many other encounter types? This could work: CHN/OBN/WOD <---- current solution
Predicted value is now 6+6+3+6+6=27.
further edit: also refining the scores, getting probably nonsense (due to missing some dependcy of some stuff on something else, probably), but it’s looking like maybe every encounter’s score depends on whether the previous encounter was Nothing/null. Except traps/golems? Which would explain why Steel Golems are being reported as better in the first slot.
I’m also getting remarkably higher numbers for Hag compared with my earlier method. But I don’t immediately see a way to profitably exploit this.
Looking like I’ll not have figured this out before the time limit despite the extra time, what I have so far:
I’m modeling this as follows, but haven’t fully worked out and am getting complications/hard to explain dungeons that suggest that it might not be exactly correct
the adventurers go through the dungeons using rightwards and downwards moves only, thus going through 5 rooms in total.
at each room they choose the next room based on a preference order (which I am assuming is deterministic, but possibly dependent on, e.g. what the current room is)
the score is dependent only on the rooms they pass through (but again, am getting complications)
I’m assuming a simple addition of scores to start with, but then adding epicycles (which so far have been based on the previous room, generally)
there is some randomness in the individual score contributions from each encounter.
For the dungeon generation: dungeon generation seems to treat rooms 1-8 equally (room 9 is different and tends to have harder encounters). Encounters of the same types (and some related “themes”) tend to be correlated. Scores in each tournament seem to be whole numbers from each judge and averaged between 3 or 4 judges; I am not sure if any tournaments are judged by 2 or 1, but if so they’re relatively less common.
In theory, I’d like to plug in a preference model and a score model to a simulator and iterate to refine, but I’m not there yet, still working out plausible scores and preferences.
One possibility for the scores and preference order:
baseline average scores:
Nothing: 0; Goblins: 1.5 (1d2?); Whirling Blade Trap 3; Orcs 3; Hag 4; Boulder Trap 4.5; Clay Golem 6, Dragon 6?, Steel Golem 7.5 (edit: <--- numbers estimated with small, atypical samples (included many Nothing, which is problematic for reasons that become obvious with below edit))
With Goblins and Orcs being increased (doubled?) if following goblins/orcs/any trap? (edit—or golems?) (edit—looking now like it’s probably anything but an empty room?)
Plus with the adventurers seemingly avoiding Orcs and Hags more than their difficulty warrants? (I found them to be relatively late in the preference order, then found that they were in practice lower in score, so am having to ad hoc adjust if I keep the assumption that the score contribution and prefrence order are related. 1.5 multiplier? 2x multiplier? fixed addition?) (I’m assuming a 1.5x multiplier atm since I initially had Hag avoided over anything but orcs, but found one dungeon that looks suspiciously like, but does not prove, Hag being chosen over Dragon (edit: see below for update)) (I suppose +2 would also work) (edit—it looks like the Orc difficulty increase for following a non-empty room only applies to adventurer preference if the current room is also Orcs—violating the assumption that preference is tied to expected difficulty. But for Goblins it seems the preference may indeed depend only on following a non-empty room, though in practice it doesn’t matter much since it only affects order wrt WBT).
(edit—see update to preference order below)
Assuming the above is correct, and I’m pretty sure it isn’t but hopefully has some relationship with reality, one strategy might be:
CHN/WON/BOD <---obsolete answer
where the idea is to use the encounters the adventurers avoid too much relative to their actual score contributions (Hag, Orcs) to herd the adventurers away from the Nothing rooms. One of the Orcs is left in after a Boulder Trap in the belief that will make it score higher than the hag. WBT is left in the preferred path to lead the adventurers along, don’t immediately see a way to avoid this.
EV if above model is correct: 6+3+4.5+6+6=25.5
How I’ve gotten here (mainly used Claude and Claude-written code, including the analysis tool which is good for prototyping if you don’t mind javascript):
found initial basic encounter score contribution estimates from linear regression on whole dungeon
after determining that rooms 1-8 were interchangeable as far as dungeon generation is concerned, looked at room importance to score, guessed the basic model based on that iirc (might have been more complicated than this) (I do remember considering and rejecting a model where each room is selected one at a time from the full set of available rooms, and rejecting any “symmetrical” model based on working out the full path in advance)
initially assumed that adventurers preferred easier encounters based on the inital score estimates
refined preference order based on minimizing variance between same-predicted-sequence-of-encounters dungeons
tried to work out how scores actually work by filtering for specific predicted sequences of encounters and finding their scores
found epicycles from that and started refining model, including preference order adjustments
haven’t really finished the above step, epicycles might be because model is wrong/incomplete?
hypothetical todo: apply model to entire dataset, also develop model for variations in score from each encounter, compare to known 3-judge and 4-judge tournaments for full Bayes assessment, refine further with this as feedback
edit: I’ve now read other people’s comments; I did not notice any 1-point jump in scores (didn’t check for it), not sure if i would have noticed if it is a judging difference as opposed to a strategy change? (wouldn’t notice if just strategy change). Also I did not notice anything special about Steel Golems at the entrance vs. other spots, did not check for any change in distribution of 3 vs 4 judge tournaments, etc.
further analysis after the above:
I’ve looked at root mean square deviation of predictions from the data for the full dataset (full Bayes seems a bit intimidating to code atm even with AI help). From this it seems the preference order is (there remains a likely possibility for more complications I haven’t checked):
Nothing > Goblins (current encounter null or Nothing) > Goblins (otherwise) = Whirling Blade Trap > Boulder Trap = Clay Golem = Orcs (current encounter not Orcs) > Dragon > Steel Golem >= Orcs (current encounter Orcs) > HagNothing > Goblins (current encounter null or Nothing) > Goblins (otherwise) = Whirling Blade Trap > Boulder Trap > Clay Golem = Orcs (current encounter not Orcs) > Dragon > Orcs (current encounter Orcs) > Hag = Steel Golemwhere I can’t distinguish between Steel Golem being preferred or equal to Orcs with current encounter being Orcs.Soo, if Orcs are avoided equally to a Boulder Trap if the current encounter is not Orcs, I need to improve the herding.But also it seems Orcs get doubled by many other encounter types? This could work:CHN/OBN/WOD <---- current solution
Predicted value is now 6+6+3+6+6=27.
further edit: also refining the scores, getting probably nonsense (due to missing some dependcy of some stuff on something else, probably), but it’s looking like maybe every encounter’s score depends on whether the previous encounter was Nothing/null. Except traps/golems? Which would explain why Steel Golems are being reported as better in the first slot.
I’m also getting remarkably higher numbers for Hag compared with my earlier method. But I don’t immediately see a way to profitably exploit this.