There’s going to be a similar assumption in a good multipolar trap approach.
If Alice yelled at you with a megaphone that everyone in your house / city / whatever must now (1) obey Alice, and (2) torture anyone who doesn’t obey Alice, that’s not going to cause you to start obeying Alice.
You need some feedback signal that actually causes the agents to care about the rules; in debate this comes from the human judge + gradient descent, I expect you’ll need something analogous in any multipolar trap approach.
There’s going to be a similar assumption in a good multipolar trap approach.
If Alice yelled at you with a megaphone that everyone in your house / city / whatever must now (1) obey Alice, and (2) torture anyone who doesn’t obey Alice, that’s not going to cause you to start obeying Alice.
You need some feedback signal that actually causes the agents to care about the rules; in debate this comes from the human judge + gradient descent, I expect you’ll need something analogous in any multipolar trap approach.