I don’t know what Steve would say, but I know that some folks from DeepMind and Stanford have recently used an LLM to create rewards to train another LLM to do specific tasks, like negotiation. which I think is exactly what you’ve described. It seems to work really well.
I don’t know what Steve would say, but I know that some folks from DeepMind and Stanford have recently used an LLM to create rewards to train another LLM to do specific tasks, like negotiation. which I think is exactly what you’ve described. It seems to work really well.
Reward Design with Language Models