Reddit Sues Perplexity and Data Scrapers Over Alleged Illegal Content Harvesting
Background
Reddit, one of the internet’s largest collections of human conversation, has become a coveted source of training material for artificial‑intelligence models. In response to concerns about compensating for the use of its data, Reddit introduced API changes in 2023 and has since entered licensing agreements with major AI companies, including OpenAI and Google.
Allegations Against Perplexity and Scrapers
The lawsuit asserts that Perplexity and three data‑scraping service providers—SerpApi, Oxylabs and AWMProxy—engaged in “industrial‑scale, unlawful circumvention of data protections.” Reddit likens the scrapers to “would‑be bank robbers” who, unable to break into a vault, target the armored truck carrying cash. According to the complaint, Perplexity is a customer of at least one of these scrapers and chose to obtain Reddit content through them rather than negotiate a direct agreement.
Reddit sent a cease‑and‑desist letter to Perplexity in May 2024, demanding that the company stop scraping Reddit data. Perplexity responded that it did not use Reddit content to train AI models and would respect Reddit’s robots.txt file. Despite that response, Reddit says the volume of citations to its content on Perplexity’s platform increased after the letter was sent.
In one illustrative incident, Reddit posted a piece of content that could only be accessed via a Google search. Within hours, Perplexity reproduced the exact contents of that post, leading Reddit to conclude that the company must have scraped Google search results to obtain the Reddit material and then incorporated it into its answer engine.
Legal Claims and Context
The complaint characterizes the defendants’ conduct as part of a broader “data laundering” economy, where scrapers bypass technological protections, steal data, and sell it to AI developers hungry for high‑quality human content. Reddit’s chief legal officer, Ben Lee, called the defendants “textbook examples” of illegal behavior, noting that they mask their identities, hide locations, and disguise web scrapers to steal Reddit content from Google search results.
Reddit’s legal action follows earlier litigation, including a suit against Anthropic for alleged unauthorized access to Reddit’s platform. The company emphasizes that its user‑generated posts are valuable assets that should be accessed through lawful agreements, not through covert scraping operations.
Perplexity’s Response
Perplexity has not yet been served with the lawsuit. A spokesperson for the company, Jesse Dwyer, stated that Perplexity has not received the legal complaint and reiterated the firm’s commitment to “principled and responsible” AI development. Dwyer emphasized that Perplexity aims to provide factual answers with accurate AI while respecting public interest and openness.
Reddit’s lawsuit seeks to halt the alleged illegal data harvesting and to hold the defendants accountable for what it describes as an industrial‑scale effort to steal copyrighted content.
Usado: News Factory APP - descoberta e automação de notícias - ChatGPT para Empresas