Meta: I’d have appreciated a version with less math, because extra formalization can hide the contribution. Or, first explain colloquially why you believe X, and then show the math that shows X.
I don’t see your claim. It looks heavily incentivized to steer state sequences to be desirable to its utility mixture. How do the evaluators even enter the picture?
Meta: I’d have appreciated a version with less math, because extra formalization can hide the contribution. Or, first explain colloquially why you believe X, and then show the math that shows X.
I don’t see your claim. It looks heavily incentivized to steer state sequences to be desirable to its utility mixture. How do the evaluators even enter the picture?
Thanks for the meta-comment; see Wei’s and my response to Rohin.