I will go further than what you are arguing here: If the maximalist interpretations are upheld, this would fundamentally break the viability of LLMs as a paradigm for AGI, because their data dependence and an unusual amount of memorization leading to pretty extreme generalization failures pretty much necessitate copyright violations on an extensive scale.
And notably, the future paradigms that wouldn’t violate extensive/maximalist interpretations of copyright would have to have far more data efficiency than current models, and depending on how far copyright is upheld, this could potentially make AGI/ASI infeasible, straight up.
Yes, it’s a bit of a long-shot, but this is a case to watch, because the consequences for the AI industry if the case goes badly for Anthropic could be very big for the AI industry as a whole, especially if they have to delete their model/delete their training sets entirely.
Unfortunately, the case where the copyright holders win out and fundamentally break the back of Anthropic/the AI industry is probably a bad thing from an existential risk perspective, because of capabilities potentially increasing in a way that society won’t react to, so most of my hope here is that the copyright holders don’t get the maximalist interpretation of damages they seek.
Training on copyrighted data wasn’t ruled infringing by itself though, only pirating the books was. So even if the maximalist interpretation of damages was upheld, companies could still legally purchase books to train on.
I will go further than what you are arguing here: If the maximalist interpretations are upheld, this would fundamentally break the viability of LLMs as a paradigm for AGI, because their data dependence and an unusual amount of memorization leading to pretty extreme generalization failures pretty much necessitate copyright violations on an extensive scale.
And notably, the future paradigms that wouldn’t violate extensive/maximalist interpretations of copyright would have to have far more data efficiency than current models, and depending on how far copyright is upheld, this could potentially make AGI/ASI infeasible, straight up.
Yes, it’s a bit of a long-shot, but this is a case to watch, because the consequences for the AI industry if the case goes badly for Anthropic could be very big for the AI industry as a whole, especially if they have to delete their model/delete their training sets entirely.
Unfortunately, the case where the copyright holders win out and fundamentally break the back of Anthropic/the AI industry is probably a bad thing from an existential risk perspective, because of capabilities potentially increasing in a way that society won’t react to, so most of my hope here is that the copyright holders don’t get the maximalist interpretation of damages they seek.
Training on copyrighted data wasn’t ruled infringing by itself though, only pirating the books was. So even if the maximalist interpretation of damages was upheld, companies could still legally purchase books to train on.