I think the assumption here that AIs are “learning to train themselves” is important. In this scenario they’re producing the bulk of the data.
I also take your point that this is probably correctable with good training data. One premise of the story here seems to be that the org simply didn’t try very hard to align the model. Unfortunately, I find this premise all too plausible. Fortunately, this may be a leverage point for shifting the odds. “Bother to align it” is a pretty simple and compelling message.
Even with the data-based alignment you’re suggesting, I think it’s still not totally clear that weird chains of thought couldn’t take it off track.
I think the assumption here that AIs are “learning to train themselves” is important. In this scenario they’re producing the bulk of the data.
I also take your point that this is probably correctable with good training data. One premise of the story here seems to be that the org simply didn’t try very hard to align the model. Unfortunately, I find this premise all too plausible. Fortunately, this may be a leverage point for shifting the odds. “Bother to align it” is a pretty simple and compelling message.
Even with the data-based alignment you’re suggesting, I think it’s still not totally clear that weird chains of thought couldn’t take it off track.