Usman Anwar

Karma: 47

Usman Anwar 18 May 2026 15:18 UTC
1 point
0
in reply to: Daniel Tan’s comment on: Daniel Tan’s Shortform
https://arxiv.org/abs/2209.13085 is probably a good point to start.

> So one naive notion is “% of rollouts on which they get the same score.”
rollouts from which policy? In RL lingo, you can talk about something like # of (s,a) pairs on which the rewards differ which marginalizes out the policy (though I am not sure how instructive such formulation would be for LLMs given dynamics are kind of implicitly specified by the LLM itself). You could then talk about differences in occupancy measures induced by the different reward functions as a “policy space” dual of “differences in reward functions”.