[Question] Should we push for requiring AI training data to be licensed?

We frequently speak about AI capability gain being bad because it shortens the timeframe for AI safety research. In that logic, taking steps to decrease AI capability would be worthwhile.

At the moment the large language models are trained with a lot of data without the company, that trains the language model, licensing the data. If there would be a requirement to license the required data, that would severely reduce the available data for language models and reduce their capabilities.

It’s expensive to fight lawsuits in the United States. Currently, there are artists who feel like their rights are violated by Dalle 2 using their art as training data. Similar to how Thiel funded the Gawker lawsuits, it would be possible to support artists in a suit against OpenAI to require OpenAI to license images for training Dalle 2. If such a lawsuit is well-funded it will be much more likely that a precedent for requiring data licensing gets set which would slow down AI development.

I’m curious about what people who think more about AI safety than myself think about such a move. Would it be helpful?