iceman comments on AI alignment is distinct from its near-term applications

iceman 13 Dec 2022 15:40 UTC
26 points
8

While it’s nice to have empirical testbeds for alignment research, I worry that companies using alignment to help train extremely conservative and inoffensive systems could lead to backlash against the idea of AI alignment itself.

On the margin, this is already happening.

Stability.ai delayed the release of Stable Diffusion 2.0 to retrain the entire system on a dataset filtered without any NSFW content. There was a pretty strong backlash against this and it seems to have caused a lot of people to move towards the idea that they have to train their own models. (SD2.0 appeared to have worse performance on humans, presumably because they pruned out a large chunk of pictures with humans in it since they didn’t understand how the range of the LAION punsafe classifier, and the evidence of this is in the SD2.1 model card where they fine tuned 2.0 with a radically different punsafe value.)

I know of at least one 4x A100 machine that someone purchased for fine tuning because of just that incident, and have heard rumors of a second. We should expect censored and deliberately biased models to lead to more proliferation of differently trained models, compute capacity, and the expertise to fine tune and train models.