Kaj_Sotala comments on Anthropic Faces Potentially “Business-Ending” Copyright Lawsuit

Kaj_Sotala 26 Jul 2025 21:06 UTC
8 points
0
- In another case (“the Meta case”), a judge ruled that “getting books illegally and training on them” was fine, actually.
  This doesn’t count for much, but my legal intuitions are saying “wait what”. Is this more reasonable than it sounds?
  Why doesn’t it count as precedent in this case?
The ruling in the Meta case stated that it’s not saying that Meta’s actions were fine in general, only that the authors who sued Meta failed to make a good argument that it’s wrong in their case in particular.
Because the performance of a generative AI model depends on the amount and quality of data it absorbs as part of its training, companies have been unable to resist the temptation to feed copyright-protected materials into their models—without getting permission from the copyright holders or paying them for the right to use their works for this purpose. This case presents the question whether such conduct is illegal. Although the devil is in the details, in most cases the answer will likely be yes. [...]
… in many circumstances it will be illegal to copy copyright-protected works to train generative AI models without permission. Which means that the companies, to avoid liability for copyright infringement, will generally need to pay copyright holders for the right to use their materials.
But that brings us to this particular case. The above discussion is based in significant part on this Court’s general understanding of generative AI models and their capabilities. Courts can’t decide cases based on general understandings. They must decide cases based on the evidence presented by the parties.
In this case, thirteen authors—mostly famous fiction writers—have sued Meta for downloading their books from online “shadow libraries” and using the books to train Meta’s generative AI models (specifically, its large language models, called Llama). The parties have filed cross-motions for partial summary judgment, with the plaintiffs arguing that Meta’s conduct cannot possibly be fair use, and with Meta responding that its conduct must be considered fair use as a matter of law. In connection with these fair use arguments, the plaintiffs offer two primary theories for how the markets for their works are affected by Meta’s copying. They contend that Llama is capable of reproducing small snippets of text from their books. And they contend that Meta, by using their works for training without permission, has diminished the authors’ ability to license their works for the purpose of training large language models. As explained below, both of these arguments are clear losers. Llama is not capable of generating enough text from the plaintiffs’ books to matter, and the plaintiffs are not entitled to the market for licensing their works as AI training data. As for the potentially winning argument—that Meta has copied their works to create a product that will likely flood the market with similar works, causing market dilution—the plaintiffs barely give this issue lip service, and they present no evidence about how the current or expected outputs from Meta’s models would dilute the market for their own works.
Given the state of the record, the Court has no choice but to grant summary judgment to Meta on the plaintiffs’ claim that the company violated copyright law by training its models with their books. But in the grand scheme of things, the consequences of this ruling are limited. This is not a class action, so the ruling only affects the rights of these thirteen authors—not the countless others whose works Meta used to train its models. And, as should now be clear, this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.
- philh 27 Jul 2025 10:25 UTC
  2 points
  0
  Parent
  Ah, so this is only talking about whether training on the material was fair use. I was surprised by the “getting books illegally and then training on them” thing.
  
  Skimming the pdf (mostly ctrl+f “shadow”), it sounds like “downloading a book might not be illegal depending what you do with it”, which I hadn’t realized. p. 36:
  
  as already discussed, even though that downloading is a separate use, it must be considered in light of its overall purpose. For instance, imagine a researcher who downloaded books from a shadow library in the process of writing an article on shadow libraries, and only did so for their research. That downloading would almost certainly be a fair use. Of course, in that example, the downloader has less ability to procure the books elsewhere than Meta did. But the point is that downloading from a shadow library, which the plaintiffs refer to as “unmitigated piracy,” must be viewed in light of its ultimate end.
  
  (My previous understanding was that the rule was something like “if you already own something legally, you’re allowed to download it; but if not, you’re not”.)
  
  ...but now I don’t understand how this differs from the Anthropic case, which I summarized as
  This judge ruled that training on books you acquired legally is fair use.
  
  But Anthropic didn’t acquire the books they trained on legally. This judge also ruled that they’re liable for the copyright infringement involved in getting them illegally.
  was that a bad summary?
  - Kaj_Sotala 27 Jul 2025 10:42 UTC
    6 points
    0
    Parent
    Your summary seems correct to me. Apparently the Meta judge disagrees with the reasoning in the Anthropic case; the Meta ruling has a brief comment on it:
    Speaking of which, in a recent ruling on this topic, Judge Alsup focused heavily on the transformative nature of generative AI while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on. Such harm would be no different, he reasoned, than the harm caused by using the works for “training schoolchildren to write well,” which could “result in an explosion of competing works.” Order on Fair Use at 28, Bartz v. Anthropic PBC, No. 24-cv-5417 (N.D. Cal. June 23, 2025), Dkt. No. 231. According to Judge Alsup, this “is not the kind of competitive or creative displacement that concerns the Copyright Act.” Id. But when it comes to market effects, using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take. This inapt analogy is not a basis for blowing off the most important factor in the fair use analysis.
    I think the Anthropic case didn’t establish precedent because they’re both District Court judges, so allowed to disagree with each other’s decisions. A decision by their Court of Appeals or the Supreme Court would establish binding precedent.
    Claude’s explanation
    District court judges—even within the same district—are not bound by each other’s decisions. Each district judge has independent authority to interpret the law, which explains why you saw the second judge cite the first ruling only to disagree with it.
    Here’s how precedent actually works in the US system:
    Binding precedent comes from higher courts. California district courts must follow precedents set by the Ninth Circuit Court of Appeals (which covers California) and the Supreme Court. These are called “controlling authorities.”
    Persuasive precedent includes decisions from other district courts, even within the same district. A judge might consider these rulings, cite them, and explain why they agree or disagree—exactly what you witnessed. The second judge was essentially saying “I’ve looked at how my colleague handled this issue, but I think they got it wrong for these reasons.”
    This happens frequently in emerging legal areas like AI and copyright, where there’s limited appellate guidance. District courts become testing grounds for different legal theories. Eventually, if these cases get appealed, the Ninth Circuit might resolve the split by establishing binding precedent for all California district courts.
    The fair use question you mentioned is particularly ripe for this kind of disagreement since it involves a four-factor balancing test that different judges can reasonably weigh differently, especially when applying established doctrine to novel technology.
    This disagreement between district courts actually serves a useful function—it creates a record of different approaches that appellate courts can consider when they eventually do establish binding precedent.