Thomas Larsen comments on Inner Alignment via Superpowers

Thomas Larsen 1 Sep 2022 6:56 UTC
7 points
3
In the model based RL set up, we are planning to give it actions that can directly modify the game state in any way it likes. This is sort of like an arbitrarily-powerful superpower, because it can change anything it wants about the world, except of course that this is a cartesian environment and so it can’t, e.g., recursively self improve.
With model free RL, this strategy doesn’t obviously carry over so I agree that we are limited to easily codeable superpowers. .