Yeah might depend on the details, but it shouldn’t learn to always use or not use superpowers, because it should still be trained on some normal non-superpower rollouts. So the strategy it learns should always involve using superpowers when available but not using them when not, otherwise it’d get high training loss.
Yeah might depend on the details, but it shouldn’t learn to always use or not use superpowers, because it should still be trained on some normal non-superpower rollouts. So the strategy it learns should always involve using superpowers when available but not using them when not, otherwise it’d get high training loss.