Rohin Shah comments on [missing post]

Rohin Shah 17 Jan 2022 15:48 UTC
3 points
There’s going to be a similar assumption in a good multipolar trap approach.
If Alice yelled at you with a megaphone that everyone in your house / city / whatever must now (1) obey Alice, and (2) torture anyone who doesn’t obey Alice, that’s not going to cause you to start obeying Alice.
You need some feedback signal that actually causes the agents to care about the rules; in debate this comes from the human judge + gradient descent, I expect you’ll need something analogous in any multipolar trap approach.