I agree.
An even simpler example: If the agents are reward learners, both of them will optimize for their own reward signal, which are two different things in the physical world.
I agree.
An even simpler example: If the agents are reward learners, both of them will optimize for their own reward signal, which are two different things in the physical world.