In practice, one can think of this as ChatGPT committing copyright infringement if and only if everyone else is committing copyright infringement on that exact same passage, making it so often duplicated that it learned this is something people reproduce.
Definitely. Currently, I am of the opinion that there’s nothing LLMs do with their training data that is fundamentally much different than what we normally describe with the word “reading,” when it happens in a human mind instead of an LLM. IDK if you could convince a court of that, but if you could it would seem to be a pretty strong defense against copyright claims.
Definitely. Currently, I am of the opinion that there’s nothing LLMs do with their training data that is fundamentally much different than what we normally describe with the word “reading,” when it happens in a human mind instead of an LLM. IDK if you could convince a court of that, but if you could it would seem to be a pretty strong defense against copyright claims.