(I don’t have a law degree, this is not legal advice, my background is going through a US copyright law course many years ago.) I’ve read most of the lawsuit and skimmed through the rest, some quick thoughts on the allegations:
Memorisation: when ChatGPT outputs text that closely copies original NYT content, this is clearly a copyright infringement. I think it’s clear that OpenAI & Microsoft should be paying everyone whose work their LLMs reproduce.
Training: it’s not clear to me whether training LLMs on copyrighted content is a copyright infringement under the current US copyright law. I think lawmakers should introduce regulations to make it an infringement, but I wouldn’t think the courts should consider it to be an infringement under the current laws (although I might not be familiar with all relevant case law).
Summarising news articles found on the internet: copyright protects expression, not facts (if you read about something in a NYT article, the knowledge you received isn’t protected by copyright, and you’re free to share the knowledge); I think that if an LLM summarises text it has lawful access to, this doesn’t violate copyright if it just talks about the same facts, or might be fair use. NYT alleges damage from Bing that Wikipedia also causes by citing facts and linking the source. I think to the extent LLMs don’t preserve the wording/the creative structure, copyright doesn’t provide protection; and some preservation of the structure might be fair use.
Hallucinations: ChatGPT hallucinating false info and attributing it to NYT is outside copyright law, but seems bad and damaging. I’m not sure what the existing law around that sort of stuff is, but I think even if it’s not covered by the existing law, it’d be great to see regulations making AI companies liable for all sorts of damage from their products, including attributing statements to people who’ve never made them.
NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts
Unpaywalled article, the lawsuit.
(I don’t have a law degree, this is not legal advice, my background is going through a US copyright law course many years ago.) I’ve read most of the lawsuit and skimmed through the rest, some quick thoughts on the allegations:
Memorisation: when ChatGPT outputs text that closely copies original NYT content, this is clearly a copyright infringement. I think it’s clear that OpenAI & Microsoft should be paying everyone whose work their LLMs reproduce.
Training: it’s not clear to me whether training LLMs on copyrighted content is a copyright infringement under the current US copyright law. I think lawmakers should introduce regulations to make it an infringement, but I wouldn’t think the courts should consider it to be an infringement under the current laws (although I might not be familiar with all relevant case law).
Summarising news articles found on the internet: copyright protects expression, not facts (if you read about something in a NYT article, the knowledge you received isn’t protected by copyright, and you’re free to share the knowledge); I think that if an LLM summarises text it has lawful access to, this doesn’t violate copyright if it just talks about the same facts, or might be fair use. NYT alleges damage from Bing that Wikipedia also causes by citing facts and linking the source. I think to the extent LLMs don’t preserve the wording/the creative structure, copyright doesn’t provide protection; and some preservation of the structure might be fair use.
Hallucinations: ChatGPT hallucinating false info and attributing it to NYT is outside copyright law, but seems bad and damaging. I’m not sure what the existing law around that sort of stuff is, but I think even if it’s not covered by the existing law, it’d be great to see regulations making AI companies liable for all sorts of damage from their products, including attributing statements to people who’ve never made them.