Reddit Sues Perplexity and Three Other Firms Over Unauthorized Data Scraping
Background
Reddit, a major online community platform, has increasingly sought to monetize its vast repository of user‑generated posts by licensing the data to technology companies. The platform has entered agreements with prominent AI developers and has also experimented with its own AI answer tool that draws on Reddit content. To protect its intellectual property, Reddit has taken steps to limit unauthorized crawling and scraping of its site.
The Lawsuit
In a new legal action, Reddit alleges that four companies—Perplexity, SerApi, OxyLabs and AWMProxy—scraped Reddit posts from search‑engine results and incorporated that material into AI services without obtaining a license. The complaint claims the defendants bypassed Reddit’s licensing system, thereby depriving the platform of revenue and violating its terms of use. Reddit is seeking financial damages and a permanent injunction to prevent the defendants from selling or using the scraped content in the future.
Companies Involved
Perplexity, an AI answer‑engine startup, relies on large datasets to train its models. The lawsuit asserts that Perplexity quickly reproduced a test Reddit post that was deliberately placed on the web to be indexed only by search engines, demonstrating that the content was obtained through scraping. The other three defendants—SerApi, OxyLabs and AWMProxy—are described as firms whose business models center on collecting data from search results and reselling it to clients, including AI developers.
Reddit’s Response
Reddit says it provided a cease‑and‑desist notice to Perplexity, which claimed it did not use Reddit data but continued to cite the platform in its answers. Reddit’s legal team presented evidence that the test post was reproduced by the defendants’ systems shortly after it was indexed, supporting the claim of unauthorized scraping. The company has also taken technical measures such as rate‑limiting unknown bots and restricting access by certain web archives.
Implications for the AI Industry
The lawsuit highlights a broader conflict between online platforms that generate large volumes of user content and AI companies that need that content to train models. As platforms like Reddit move toward licensing agreements, they are asserting greater control over how their data is used. The outcome of this case could set precedents for how AI developers must obtain and pay for data, and may encourage stricter compliance with robots.txt and other web‑crawling standards.
Used: News Factory APP - news discovery and automation - ChatGPT for Business