Charlie Steiner comments on Truthful LMs as a warm-up for aligned AGI

Charlie Steiner 19 Jan 2022 22:56 UTC
LW: 2 AF: 1
0
AF
Sure—another way of phrasing what I’m saying is that I’m not super interested (as alignment research, at least) in adversarial training that involves looking at difficult subsets of the training distribution, or adversarial training where the proposed solution is to give the AI more labeled examples that effectively extend the training distribution to include the difficult cases.
It would be bad if we build an AI that wasn’t robust on the training distribution, of course, but I think of this as a problem already being addressed by the field of ML without any need for looking ahead to AGI.