Major Book Publishers File Class Action Against Meta Over Llama AI Training
Five of the world’s biggest book publishers—Macmillian, McGraw Hill, Elsevier, Hachette and Cengage—joined forces with bestselling author Scott Turow to launch a class‑action lawsuit against Meta Platforms. The complaint accuses the company of “one of the most massive infringements of copyrighted materials in history” by using their books and journal articles without permission to train the Llama family of artificial‑intelligence models.
Publishers allege massive copyright infringement
The suit says Meta deliberately scraped content from “notorious pirate sites” such as Library Genesis, Anna’s Archive, Sci‑Hub and Sci‑Mag, then incorporated those files into the Common Crawl dataset that feeds Llama. Plaintiffs argue the dataset is riddled with unauthorized copies, making Meta’s training process a direct violation of copyright law.
According to the filing, Llama can reproduce large blocks of text almost word‑for‑word. The complaint cites an example where the model, when prompted with two sentences from Cengage’s best‑selling textbook *Calculus: Early Transcendentals* (9th ed.), continued the passage verbatim, effectively recreating the copyrighted material.
Beyond the alleged copying, the publishers seek a court order that would force Meta to halt the disputed training activities and provide a comprehensive list of every book, journal article and other copyrighted work that contributed to Llama’s development. They also demand monetary damages for the alleged infringement.
Meta’s response, delivered through spokesperson Dave Arnold, frames the lawsuit as an attack on legitimate AI innovation. “AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use,” Arnold said in an emailed statement. “We will fight this lawsuit aggressively.”
The case arrives amid a growing wave of litigation targeting AI developers. Earlier this year, a federal judge ruled in Meta’s favor in a separate copyright suit, though he cautioned that the decision “does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful.” In a parallel matter, Anthropic settled a class‑action claim for $1.5 billion after being accused of training its models on pirated books.
Legal experts note that the outcome could set a precedent for how AI companies handle copyrighted data. If the court sides with the publishers, Meta may be compelled to overhaul its data‑gathering practices, potentially reshaping the AI training landscape. For now, the lawsuit adds another high‑profile chapter to the ongoing debate over the balance between technological advancement and intellectual‑property rights.
Used: News Factory APP - news discovery and automation - ChatGPT for Business