Richard_Ngo comments on Distinguishing claims about training vs deployment

Richard_Ngo 11 Feb 2021 5:54 UTC
LW: 6 AF: 4
AF
Yepp, this is a good point. I agree that there won’t be a sharp distinction, and that ML systems will continue to do online learning throughout deployment. Maybe I should edit the post to point this out. But three reasons why I think the training/deployment distinction is still underrated:
1. In addition to the clarifications from this post, I think there are a bunch of other concepts (in particular recursive self-improvement and reward hacking) which weren’t originally conceived in the context of modern ML, but which it’s very important to understand in the context of ML.
2. Most ML and safety research doesn’t yet take transfer learning very seriously; that is, it’s still in the paradigm where you train in (roughly) the environment that you measure performance on. Emphasising the difference between training and deployment helps address this. For example, I’ve pointed out in various places that there may be no clear concept of “good behaviour” during the vast majority of training, potentially undermining efforts to produce aligned reward functions during training.
3. It seems reasonable to expect that early AGIs will become generally intelligent before being deployed on real-world tasks; and that their goals will also be largely determined before deployment. And therefore, insofar as what we care about is giving them the right underying goals, then the relatively small amount of additional supervision they’ll gain during deployment isn’t a primary concern.