(On a side note, it now strikes me that there’s a parallel to RL blackbox optimization: by setting up a large penalty for any divergence from the golden path, it creates an unbiased, but high variance estimator of credit assignment. When pirates participate in enough rollouts with enough different assortments of pirates, they receive their approximate honesty-weighted return. You can try to pry open the blackbox and reduce variance by taking into account pirate baselines etc, but at the risk of losing unbiasedness if you do it wrong.)
So this is the 2-of-2 exploding Nash equilibrium technique applied to multiple parties/transactions? What’s this generalized kind called?
(On a side note, it now strikes me that there’s a parallel to RL blackbox optimization: by setting up a large penalty for any divergence from the golden path, it creates an unbiased, but high variance estimator of credit assignment. When pirates participate in enough rollouts with enough different assortments of pirates, they receive their approximate honesty-weighted return. You can try to pry open the blackbox and reduce variance by taking into account pirate baselines etc, but at the risk of losing unbiasedness if you do it wrong.)