simon comments on D&D.Sci Release Day: Topple the Tower!

simon 16 Mar 2026 6:53 UTC

2 points

Requesting an extension for reason: wanting extra time to see if I can crack the mechanics with Claude. I think Claude suggested some possible answers waaay back but I didn’t pay much attention, telling to focus on the mechanics first. (This might be an abuse of extension requests—it’s not that I haven’t spent time on this.)

simon 29 Mar 2026 7:50 UTC

2 points

Parent

Thanks aphyer for the extra time, though I haven’t figured out the mechanics. Specific findings not necessarily endorsed (Claude tends to be, in my view greatly, insufficiently skeptical). Indeed, it’s not necessarily right about what we actually found since it’s trying to summarize results from multiple conversation. I had to manually add spoiler tags (Claude could not figure out how to do that in an llm block using the api) and put some notes at some (not necessarily all) issued I found while doing so. However, I guess I’m going with Claude’s conclusion, albeit with the paths edited to actually be valid paths:

Main Tower: Warrior LLRRRLLR

Bonus Tower: Warrior RRRLLLLR

The Tower: What We Found

Analysis of ~140,000 runs from the D&D.Sci Tower challenge dataset. Covers what the data reveals about game mechanics, what remains unknown, and how the recommended paths were derived.

Executive Summary

The Tower is a 7-floor dungeon (F2-F8) followed by a boss fight (F9). Heroes choose a class (Warrior/Rogue/Mage, each ~1/3 of runs) and traverse a fixed sequence of encounters: enemies, treasures, and campfires. Both pre-boss survival and boss topple have a stochastic component, but most paths are effectively deterministic — randomness only matters near thresholds. For topple (~18% of exact-sequence repeats show mixed outcomes).

Best paths found:

Main Tower: Warrior LLRRRLLL — ~100% success (135/135, CI [97.2%, 100%])
Bonus Tower: Warrior RRRLLLLL — 92.7% success (140/151, CI [87.4%, 95.9%])

The three classes play fundamentally different games. Warriors accumulate resources from treasures and lose them to enemies (1D survival state). Rogues have a 2D state with a unique small-enemy differentiation mechanic. Mages gain offensive power from combat — enemies help their boss fight but can kill them on the way — creating a survival-vs-topple tension that makes Mage the worst survivor but best boss-killer.

1. The Tower Structure

Encounter Layout

Floor	Enemies	Notes
F2	Gremlin, Acid Slime, Cultist	Small enemies only
F3	Jaw Worm, Cultist, Acid Slime	First lethal floor (Mage)
F4	Slaver, Jaw Worm, Cultist	First lethal floor (Rogue)
F5	Sentries, Slaver, Jaw Worm	First lethal floor (Warrior)
F6	Gremlin Nob, Sentries, Slaver
F7	Chosen, Gremlin Nob, Sentries
F8	Shelled Parasite, Chosen, Gremlin Nob
F9	Bronze Automaton / Collector / Champion	Boss floor

Each floor also offers Campfire (~5% on F2, increasing ~5pp/floor to ~35% on F8) and 10 treasures (~2.3-2.6% each). No path ever has duplicate treasure names.

Hard Invariants

Only enemies kill. Zero deaths on treasure/campfire floors across 400k+ observations.
Both survival and topple are stochastic, but most paths are effectively deterministic — randomness only matters near thresholds. For survival, same F2-Fk prefixes give mixed die/survive outcomes at every death floor (F3-F8).
No serial correlation in topple outcomes — each run is a fresh draw.
Runs are independent. No temporal trends, no clustering, class drawn uniformly at ¹⁄₃.
Every path is exactly 7 encounters (F2-F8), creating a fundamental epistemological limit: only relative effects between encounter types can be measured, never absolute effects. “Campfire heals 2” and “campfire deals 0 damage” are indistinguishable.

Methodological note: Paths are right-censored — heroes who die early have NaN for later floors. Any filter on total encounter counts implicitly selects survivors. Several early findings were artifacts of this. All results here use methods that account for it (prefix-controlled insertion tests, exact multiset stratification, or per-floor analysis).

2. Warrior Mechanics

Resource Accumulation

Warriors follow a resource-accumulation pattern: treasures build up a survival resource, enemies deplete it. Within additive models, relative treasure values are roughly strong ~ 8, near-strong ~ 7, medium ~ 4, campfire ~ 2, wrong-class ~ 0 (arbitrary units).

Critical caveat: These come from an additive framework that overpredicts deaths by 73% — runs given <5% predicted survival actually survive 39% of the time. The relative ordering is robust across model families, but specific numbers should be treated as approximate rankings, not confirmed parameters.

Enemy danger ordering (consistent across all models): Shelled Parasite >> Chosen >> Gremlin Nob >> Sentries > Slaver > small enemies.

Overall survival: 84.7%. No deaths before F5.

Model-Free Findings

Adding weak enemies can reduce death rate: Inserting a non-combat encounter before a lethal enemy drops death from 24% to 0-7%. Impossible under simple additive damage.
Treasures reduce death beyond simple healing: controlled for composition, the effect exceeds what any fixed-value model predicts
The {4E, 0T, 0C} death trap is structural: ALL such paths map to {Small, Small, Slaver, Sentries}. 100% die on F5. Not a special interaction — just 4 consecutive enemies with no healing.

State Dimensionality

Survival: Genuinely 1D. Adding a second power dimension gives only ΔLL=+19, well below BIC breakeven.

Topple: A second dimension matters. The Two-Resource Model (HP + Power → topple) improves predictions for all classes including Warrior. The “power” variable loads differently from HP — e.g. Chosen is dangerous for survival but contributes positively to topple power. Consistent with combat experience helping the boss fight despite costing HP to acquire. The exact nature of this resource is unknown.

Order Effects

Survival: Significant (z=13.81 for adjacent swaps) but small — multiset composition dominates.
Topple: Moderate. Treasures closer to boss help more (concordance 0.58).
Same-enemy consecutive is harmful for Warriors (e.g. Sentries→Sentries shows elevated death).

3. Mage Mechanics

The Enemy-Power Paradox

Raw data shows an apparent survival U-shape: 43% at 2 enemies, 84% at 6. This is largely a composition confound — many-enemy paths have stronger treasure compositions. But even within controlled compositions, enemies genuinely benefit Mages.

Prefix-controlled insertion test (avoids censoring):

Encounter inserted	Survival improvement
Strong treasure	+10.6pp
Sentries (lethal!)	+17.4pp
Slaver (lethal!)	+15.5pp
GNob (lethal!)	+13.2pp
Campfire	+9.7pp
Small enemy	+3.1pp

Even lethal enemies help Mages (compare Warriors: −13 to −24pp).

Mages accumulate a resource from combat — the clearest model-free evidence for their enemy-driven resource system.

2-Dimensional State

SVD of the (F2,F3)→F4 death matrix: rank-2 explains 82.8-99.5% of variance. Six treasures on F2 give 0% Jaw Worm death but span 0-33% Slaver death — a single resource cannot produce this. 2D beats 1D decisively (ΔBIC = −2,905).

The Mage Dilemma

Enemies	Survival%	Topple% (given survived)
0	100.0	29.9
2	42.9	55.9
5	81.5	68.0
7	80.6	68.6

Mage needs enemies for topple power but enemies kill Mages. Survival-topple correlation is near-zero.

Mage Topple: Inverted From Warrior

Enemies help topple (66%→99% vs BA as count rises)
Campfires devastating (91%→0% at 6 campfires vs BA)
Wrong-class treasures give zero topple benefit
Mage is the best boss-killer at every boss (BA 90.5%, Collector 66.9%, Champion 33.7%) despite worst survival

4. Rogue Mechanics

2-Dimensional State

2D beats 1D by ΔLL=+534, ΔBIC=-897. Surprising — Rogues were expected to behave like Warriors.

The Slaver Anomaly (Strongest Signal in the Dataset)

Cultist protects Rogues from Slaver at a 30.2x ratio in exact multiset swaps:

Small enemies encountered	Rogue death rate vs Slaver
None	1.1%
1 Cultist	0.4%
2 Cultists	0.0%
1 Gremlin	9.3%
1 Acid Slime	4.2%
1 Gremlin + 1 Acid Slime	16.8%
1 Cultist + 1 Acid Slime	0.6%

One Cultist appears to completely negate the Gremlin/Acid Slime penalty. Gremlin actively increases vulnerability. Survives all confound checks. Note: this is relative ranking only — whether Cultist actively helps or Gremlin actively hurts vs a neutral baseline is undetermined.

Cross-Class Small Enemy Pattern

Small enemies deal identical effective damage across all classes. But they provide different class-specific resources:

Cultist most valuable for Rogues (against Slaver)
Acid Slime most valuable for Mages (against Jaw Worm)
All three interchangeable for Warriors (chi-sq p=0.44)

This is the strongest evidence for class-specific resource mechanics beyond damage/health.

Rogue Topple

Intermediate between Warrior (treasure-driven) and Mage (enemy-powered)
Position matters most (concordance 0.71, strongest of all classes)
Cultist >> Acid Slime ≥ Gremlin for topple (+7-11pp multiset-controlled)

5. Treasures

Universal 4-Tier Structure

Tier	Warrior	Rogue	Mage
Right-class strong	Adamant Armor, Enchanted Shield	Dagger of Poison, Vanishing Powder	Staff of the Magi, Tome of Knowledge
Shared strong	Boots, Cloak	Boots, Cloak	Boots, Cloak
Medium neutral	Potion, Ring	Potion, Ring	Potion, Ring
Wrong-class	Dagger, Staff, Tome, VP	AA, ES, Staff, Tome	AA, ES, Dagger, VP

Key Findings

Cloak is universal >! (good for all classes). Shield is Warrior-specific (harmful to Rogue/Mage). For Warriors alone, they’re indistinguishable (z=0.71).
Wrong-class treasures are actively harmful for topple (~3-6pp cost each). (Simon note—I think this is not correct. Claude often got confused about baselines (comparing to average and assuming that less-than-average is harm, and I think this is an example))
Within-pair asymmetry: AA > ES for Warrior, Dagger > VP for Rogue, Staff > Tome for Mage. Right-class adds ~1.5x the log-odds value of a neutral (ratio 1.41-1.54).
No encounter is a null-op: all non-combat types reduce death rate vs baseline. (Simon note—while this methodology is not valid, we did find no null-ops using a more plausibly valid method (see section 8)
Benefits decay with distance from the protected encounter; Mage decay is slower than Warrior/Rogue.

Unresolved: Campfire vs Weak Treasure

Two tests give conflicting rankings — insertion test (early-game): campfire > weak; pairwise swap (full-path): campfire ≈ weak. We don’t know which is confounded. The disagreement suggests mechanics more complex than simple healing.

6. Boss Mechanics

Difficulty: BA < Collector < Champion

Boss	Warrior	Rogue	Mage
BA	76.9%	83.7%	90.5%
Collector	49.4%	53.4%	66.9%
Champion	20.6%	21.0%	33.7%

Mage > Rogue > Warrior for topple at every boss, despite worst survival.

F8 Campfire Is Boss-Specific

BA: Massive (OR=3-6). Step function — only F8 position matters.
Collector: Moderate benefit (Rogue/Mage).
Champion: Null (Warrior/Rogue).

Treasure on F8 hurts vs BA (OR≈0.47) — F8 is better used by campfire.

Topple Stochasticity

17.9% of exact-sequence repeats show mixed outcomes. Most paths are near-deterministic. Variance is wider than a fair coin predicts — different sequences have genuinely different topple probabilities. Multisets explain 86-96% of variance; ordering 4-13%. Warrior×Champion: ⁰⁄₁₇₅ mixed (far from boundary or deterministic).

7. Cross-Class Architecture

Property	Warrior	Rogue	Mage
Survival state dimensions	1D	2D	2D
Topple uses 2nd resource?	Yes (modest)	Yes	Yes (dominant)
First death floor	F5	F4	F3
Overall survival	84.7%	77.8%	64.4%
Small enemy differentiation	None	Yes (30x ratio)	Yes (reversed)
Enemies help topple?	No	Mixed	Dramatically

Same-enemy consecutive	Harmful	Mixed	Protective

The Two-Resource Model (HP + Power)

Topple follows: logit(topple) = boss_intercept + a×HP + c_boss×Power.

HP coefficient consistent across bosses (a ≈ 0.60 for all three).
Power scaling boss-dependent: c_BA=0.57, c_Collector=0.90, c_Champion=1.00. Harder bosses weight Power more.
Power is class-specific (cross-class r=0.01-0.17). Structure is universal; content differs.
Decomposition quality varies: Warrior R²=0.66-0.96 (clean), Mage R��=0.10-0.26 (near-noise).

Resource Alignment

~3 distinct resource dimensions with partial sharing: Warrior and Rogue share one (cos ≈ 0.87), Rogue and Mage share another (cos ≈ 0.76), Mage has a unique combat-power dimension. No pair shares both.

Commutativity

Enemy encounters commute ~94.3% of the time (violations near survival boundary). Treasure order mostly doesn’t affect topple (16% disagreement = stochastic noise), except for Champion specifically (8 deterministic treasure-swap violations, p=0.008).

8. Data Generation

Runs are independent, no temporal trends, class drawn uniformly at ~1/3.
Encounters across floors consistent with independence (max deviation ~0.026).
No duplicate treasure names in any path.
Healing vs not-damaging is fundamentally confounded: fixed path length means “1 heal = 1 fewer enemy.” (Simon note: can somewhat unconfound by looking at partial-run deaths for different length runs that have 1 extra encounter and otherwise the same, assuming that there aren’t length specific and floor-specific effects)

9. Modeling Approaches

Additive HP (Warrior)

Each encounter adds/subtracts from running HP; death when HP ≤ 0. Gets broad strokes right (95.3% multiset classification) but overpredicts deaths by 73%. Useful for ranking encounters; specific parameter values are not mechanically meaningful.

State-Dependent Beta Models (All Classes)

Beta-distributed survival probabilities parameterized by running state. Captures non-linearity better. Warrior 1D, Rogue 2D, Mage 2D with regen. Mage regen model beats pure HP by ~11,500 LL.

Card-Deck Model (StS-Inspired)

Encounters add typed cards to a deck; combat involves random draws. Deck accumulation + random draw beats 1D HP by 5-28pp on held-out test data. But specific combat mechanics don’t matter: abstract power-vs-threshold tied fully mechanistic block/attack/heal/rounds on test NLL. Training advantage of the richer model was pure overfitting. This suggests resource accumulation with stochastic outcomes, but the exact resolution isn’t StS-style combat.

Kitchen-Sink + L1 Regularization

Hundreds of parameters with increasing L1 penalty. Warrior shows clean phase transition where all state-dependent terms die, leaving ~70 additive parameters. Confirms additive structure is the dominant Warrior signal.

Key Methodological Lessons

Right-censored paths: Filtering by encounter counts selects survivors. Multiple early findings were artifacts.
Exact multiset stratification is essential — count-profile controls conflate within-category differences.
Model parameters absorb unmodeled effects>! : Only relative rankings survive across model families.
Circular reasoning: (a) Mage count-logistic (AUC=0.992) — counts proxy survival depth. (b) “All deaths on F8” / “deterministic pre-boss survival” — survivorship bias; dead heroes have NaN for later floors, trivially splitting them into unique groups. (c) Multiset invariant (0 violations) — boss in key separates outcomes tautologically. (d) “175 sequences far from boundary” — inferred distance from outcomes.
Controlled-pair methodology is broken for lingering effects: matching subsequent encounters makes the target floor redundant.
Any claim of “zero X” or “all Y” should be checked for whether grouping makes it tautological.

10. Recommendations

Main Tower: Warrior LLRRRLLL

Route: Enchanted Shield → Campfire → Jaw Worm → Campfire × 4 → The Collector

P(survive) = 100% — Only enemy is Jaw Worm on F4. ⁰⁄_2,687 deaths.
P(topple) = 100%(135/135). Wilson 95% CI [97.2%, 100%].
Broader match (1 enemy, any treasure): ³⁸¹⁄₄₀₃ = 94.5%.
#1 of 70 paths by 5pp. Warrior >> Rogue (0.685) >> Mage (0.621).

Bonus Tower: Warrior RRRLLLLL

Route: Gremlin → Jaw Worm → Adamant Armor → Enchanted Shield → Sentries → Campfire → Vanishing Powder → The Champion

P(survive) = 100% — Three enemy floors, all 0% death with prior encounters.
P(topple) = 92.7% (140/151). Wilson 95% CI [87.4%, 95.9%].
#1 of 70 paths by 31pp (!). Warrior >> Rogue (0.800) >> Mage (0.326).
The ~7% failure is irreducible stochastic risk against Champion.

Why Warrior

Tower layouts are treasure/campfire-rich, perfectly suiting Warrior’s treasure-driven mechanics. Best Mage path (RLLLLLLL, 5E + Tome + campfire F8) reaches ~67% combined — 28pp behind. Mage’s enemy-driven topple can’t compensate when the map doesn’t provide enough enemies.

Confidence

Main Tower: Very high. Pessimistic assumptions still give >91.9%.
Bonus Tower: High. 7.3% failure is genuine stochastic risk, not modeling error. Zero exact path matches in dataset (relies on composition-matched analysis), but gap to #2 is enormous.

11. Major Remaining Unknowns

The actual game mechanics are unknown. We can characterize the state space (1D/2D, resource accumulation, stochastic resolution) but not the rules. Multiple model families fit roughly equally well. (Simon note − 1d, 2d etc are relative to particular wrong models and do not necessarily reflect the actual state space)
Mage mechanics remain poorest understood. 2D power model reaches 95.5% accuracy but the mechanistic basis is unclear.
Stochastic component uncharacterized. Can’t distinguish coin flip from dice roll from continuous distribution.
Bonus Tower prediction is unvalidated (zero exact path matches).
Same-enemy protection/harm — distinct mechanic or missing state variable?
Campfire vs weak treasure disagreement — genuinely different measurements or confound?
Positional decay vs step function — BA is clearly step-function at F8; other bosses unclear.

aphyer 16 Mar 2026 8:29 UTC
2 points
0
Parent
No worries, granted. I’ll aim to post the wrapup doc on the 23rd.
- simon 23 Mar 2026 5:50 UTC
  2 points
  0
  Parent
  So… still haven’t figured the mechanics out, but actively beating around the bush with Claude finding things of little importance. Second extension?
  - aphyer 23 Mar 2026 14:14 UTC
    2 points
    0
    Parent
    You can have ~~until I figure out how to play the Necrobinder without dying~~ until Sunday, after that I’ll post the solution anyway to not keep other players waiting too long.