It’s not clear why the model θ would come to be optimizing a reward function R in the first place.
(Not a real comment, I’m just also testing the latex)
Still works for me; I think you don’t have the correct markdown manual latex mode enabled on your account.
(Not a real comment, I’m just also testing the latex)
Still works for me; I think you don’t have the correct markdown manual latex mode enabled on your account.