It’s worth noting that in cases where you care about average case performance, you can always distill the behavior back into the model. So, average case usage can always be equivalent to generating training or reward data in my view.
It’s worth noting that in cases where you care about average case performance, you can always distill the behavior back into the model. So, average case usage can always be equivalent to generating training or reward data in my view.