Nathan Helm-Burger comments on Inner Alignment via Superpowers

Nathan Helm-Burger 30 Aug 2022 21:09 UTC
2 points
0
Interesting idea. If this works out in simulations, in theory you could do experiments in the real world by having a nerfed robot (deliberately impaired beyond the actual constraints of its hardware) which occasionally gained the superpower of not being impaired (increased strength/agility/visual resolution/reliability-of-motor-response/etc).
- Jeremy Gillen 30 Aug 2022 21:54 UTC
  1 point
  0
  Parent
  The reason this method might be useful is that it allows the agent to “fantasize” about actions it would take if it could. We don’t want it to take these actions in the real world. For example, it could explore bizarre hypotheticals like: “turn the whole world into gold/computronium/hedonium”.
  If we had a sub-human physical robot, and we were confident that it wouldn’t do any damage when doing unconstrained real world training, then I can’t see any additional benefit to using our method? You can just do normal RL training.
  And if it’s super-human, we wouldn’t want to use our technique in the real world, because it would turn us into gold/computronium/hedonium during training.
  - Nathan Helm-Burger 30 Aug 2022 22:01 UTC
    2 points
    0
    Parent
    Yes, it wouldn’t be able to go as far as those things, but you could potentially identify more down-to-earth closer-to-reality problems in a robot. Again, I was just imagining a down-the-line scenario after you have proved this works well in simulations. For instance, this sort of thing (weak robot that occasionally gets strong enough to be dangerous) in a training environment (a specially designed obstacle course) could find and prevent problems before deployment (shipping to consumers).