gwern comments on When Hindsight Isn’t 20/20: Incentive Design With Imperfect Credit Allocation

gwern 10 Nov 2020 14:46 UTC
7 points
So this is the 2-of-2 exploding Nash equilibrium technique applied to multiple parties/transactions? What’s this generalized kind called?

(On a side note, it now strikes me that there’s a parallel to RL blackbox optimization: by setting up a large penalty for any divergence from the golden path, it creates an unbiased, but high variance estimator of credit assignment. When pirates participate in enough rollouts with enough different assortments of pirates, they receive their approximate honesty-weighted return. You can try to pry open the blackbox and reduce variance by taking into account pirate baselines etc, but at the risk of losing unbiasedness if you do it wrong.)

gwern comments on When Hindsight Isn’t 20/​20: Incentive Design With Imperfect Credit Allocation