I’m confused, because I wasn’t that surprised on reading the paper. My take was that generative models are not-terrible at simulating agents to do some kind of task, which can include ones that require stuff we might call optimization. That would imply that modelling a low-fidelity RL algorithm isn’t really beyond its simulation capabilities. Independent of whether the paper actually did show models learning good RL algorithms, it feels like if they had, I wouldn’t take it as much evidence to update my priors one way or the other.
I’m confused, because I wasn’t that surprised on reading the paper. My take was that generative models are not-terrible at simulating agents to do some kind of task, which can include ones that require stuff we might call optimization. That would imply that modelling a low-fidelity RL algorithm isn’t really beyond its simulation capabilities. Independent of whether the paper actually did show models learning good RL algorithms, it feels like if they had, I wouldn’t take it as much evidence to update my priors one way or the other.