Great work! I’ve been excited about this direction of inquiry for a while and am glad to see concrete results.
Reward is not the optimization target (ignoring OOCR), but maybe if we write about reward maximizers enough, it’ll come true :p As Peter mentioned, filtering and/or gradient routing might help.
Great work! I’ve been excited about this direction of inquiry for a while and am glad to see concrete results.
Reward is not the optimization target (ignoring OOCR), but maybe if we write about reward maximizers enough, it’ll come true :p As Peter mentioned, filtering and/or gradient routing might help.