[Epistemic status: read the intro, skimmed the rest, think my point is still valid]
I appreciate the clarity of thinking that comes from being concrete about how AIs get trained and used, and noting that there are differences between what goes on in different phases of the process. That being said, I’m skeptical of a sharp distinction between ‘training’ and ‘deployment’. My understanding is that ML systems in productive use keep on being continually trained—the case I’m most familiar with is katago, to my knowledge the strongest go engine, which continues to be trained. It also seems likely to me that future smart agents will be stateful and do some kind of learning online, similarly to how humans or recurrent systems do—or perhaps will be static, but will have learned to use ‘external state’ (e.g. writing things down to remember them) - just because that seems super useful to build competency and learn from mistakes that didn’t occur during training (see e.g. this recent failure of a top go system). My guess is that imagining a ‘training phase’ where the system does nothing of consequence and a ‘deployment phase’ where the system does consequential things but is entirely frozen and not changing in interesting ways is likely to be misleading, despite the accurate fit to academic ML research.
Yepp, this is a good point. I agree that there won’t be a sharp distinction, and that ML systems will continue to do online learning throughout deployment. Maybe I should edit the post to point this out. But three reasons why I think the training/deployment distinction is still underrated:
In addition to the clarifications from this post, I think there are a bunch of other concepts (in particular recursive self-improvement and reward hacking) which weren’t originally conceived in the context of modern ML, but which it’s very important to understand in the context of ML.
Most ML and safety research doesn’t yet take transfer learning very seriously; that is, it’s still in the paradigm where you train in (roughly) the environment that you measure performance on. Emphasising the difference between training and deployment helps address this. For example, I’ve pointed out in various places that there may be no clear concept of “good behaviour” during the vast majority of training, potentially undermining efforts to produce aligned reward functions during training.
It seems reasonable to expect that early AGIs will become generally intelligent before being deployed on real-world tasks; and that their goals will also be largely determined before deployment. And therefore, insofar as what we care about is giving them the right underying goals, then the relatively small amount of additional supervision they’ll gain during deployment isn’t a primary concern.
[Epistemic status: read the intro, skimmed the rest, think my point is still valid]
I appreciate the clarity of thinking that comes from being concrete about how AIs get trained and used, and noting that there are differences between what goes on in different phases of the process. That being said, I’m skeptical of a sharp distinction between ‘training’ and ‘deployment’. My understanding is that ML systems in productive use keep on being continually trained—the case I’m most familiar with is katago, to my knowledge the strongest go engine, which continues to be trained. It also seems likely to me that future smart agents will be stateful and do some kind of learning online, similarly to how humans or recurrent systems do—or perhaps will be static, but will have learned to use ‘external state’ (e.g. writing things down to remember them) - just because that seems super useful to build competency and learn from mistakes that didn’t occur during training (see e.g. this recent failure of a top go system). My guess is that imagining a ‘training phase’ where the system does nothing of consequence and a ‘deployment phase’ where the system does consequential things but is entirely frozen and not changing in interesting ways is likely to be misleading, despite the accurate fit to academic ML research.
Yepp, this is a good point. I agree that there won’t be a sharp distinction, and that ML systems will continue to do online learning throughout deployment. Maybe I should edit the post to point this out. But three reasons why I think the training/deployment distinction is still underrated:
In addition to the clarifications from this post, I think there are a bunch of other concepts (in particular recursive self-improvement and reward hacking) which weren’t originally conceived in the context of modern ML, but which it’s very important to understand in the context of ML.
Most ML and safety research doesn’t yet take transfer learning very seriously; that is, it’s still in the paradigm where you train in (roughly) the environment that you measure performance on. Emphasising the difference between training and deployment helps address this. For example, I’ve pointed out in various places that there may be no clear concept of “good behaviour” during the vast majority of training, potentially undermining efforts to produce aligned reward functions during training.
It seems reasonable to expect that early AGIs will become generally intelligent before being deployed on real-world tasks; and that their goals will also be largely determined before deployment. And therefore, insofar as what we care about is giving them the right underying goals, then the relatively small amount of additional supervision they’ll gain during deployment isn’t a primary concern.