If it hasn’t been done already, it might be useful to test existing RL algorithms with simple grid-world domains in which “extortion dynamics” might emerge.
For example, imagine a grid-world domain with two agents, M and T, where M can do something that reduces the reward of both agents by 100, while T can do something that increases the reward of M by 100 and reduces its own reward by 10.
It might be useful to publish something that includes simulation videos in which M succeeds in extorting T. Also, perhaps something interesting could be said about algorithms/conditions in which T succeeds in defending itself from extortion.
If it hasn’t been done already, it might be useful to test existing RL algorithms with simple grid-world domains in which “extortion dynamics” might emerge.
For example, imagine a grid-world domain with two agents, M and T, where M can do something that reduces the reward of both agents by 100, while T can do something that increases the reward of M by 100 and reduces its own reward by 10.
It might be useful to publish something that includes simulation videos in which M succeeds in extorting T. Also, perhaps something interesting could be said about algorithms/conditions in which T succeeds in defending itself from extortion.