What Happened
On October 21st, Rupert Murdoch’s NewsCorp, through its subsidiaries Dow Jones & Company (owner of Wall Street Journal and Barron’s, among others) and NYP Holdings, Inc. (owner of the NY Post) together filed a lawsuit in the US Southern District of New York against startup Perplexity.ai for alleged copyright infringement and trademark violations.
They accuse Perplexity of using copyrighted works (e.g., articles from The Wall Street Journal or The New York Post) to populate its “RAG” (retrieval-augmented generation) database and thereby generate responses to user queries. The publishers’ concern is that these quick answers act as substitutes for visiting their ad supported sites, thereby diverting revenue.
For ongoing legal updates, this CourtListener page will be helpful.
From the suit: “[their] AI ‘answer engine’ copies on a massive scale, among other things, copyrighted news content, analysis, and opinion as inputs into its internal database. It then uses that copyrighted content to generate responses to users’ queries that are intended to and do act as a substitute for news and other information websites.”
The lawsuit seeks compensation for that alleged revenue loss, as well as damages caused by Perplexity’s false attributions of fabricated content to the plaintiffs; i.e., the occasional hallucinations which serve up “facts” that were not in the original article. This is the trademark violation claim; namely, that Perplexity.ai’s snippets dilute their trademarks and mislead users about the origin of the content.
Plaintiffs are demanding a permanent injunction against Perplexity’s use of their copyrighted works, plus damages of $150,000 per copyright violation.
Perplexity, for its part was initially mum on the accusation, but three days after the NewsCorp filing, Perplexity published its version of events on its blog.
Why It Happened
The tension between tech and media is nothing new. Google broke much of this legal ground in the early 2000s when it won its famous book scanning lawsuit against The Authors Guild. A quote from that legal complaint could be used almost word-for-word in the current round of tech versus publishing cases:
“Plaintiffs, authors of published books under copyright, filed suit against Google for copyright infringement. Google, acting without permission of rights holders, has made digital copies of tens of millions of books, including plaintiffs’, through its Library Project and its Google books project.”
That established something of a precedent; namely, that showing users “snippets” of copyrighted content was covered by the “fair use” doctrine, even if Google ingested millions of books into its internal systems without permission.
In the GenAI era, tech companies like OpenAI, Perplexity, Runway, etc. are coming under fire for doing almost the same thing Google does with its search.
In the GenAI era, tech companies like OpenAI, Perplexity, Runway, etc. are coming under fire for doing almost the same thing Google does with its search. But these new firms (Google included, with its Gemini platform) ingest massive amounts of copyrighted content without permission and then replay the text content to users as snippets of a sort, but in a much less transparent way. They are showing users summaries transformed through the LLM (large language model.) Sometimes, interestingly, the LLM will regurgitate large chunks of author’s work verbatim, which is why NewsCorp calls it plagiarism. And other times, the LLM poorly re-writes or misrepresents the author’s work, yet cites it. NewsCorp calls that the trademark violation.
Perplexity cannot be surprised by any of this. In the past six months, both Condé Nast (owner of Forbes) and Wired magazine raised legal and technical concerns about Perplexity’s use of their content. Condé Nast alleged in a piece in Forbes that Perplexity had reproduced sections of one of its important breaking stories, and further, Perplexity made a knock-off story, sent a push notification to is users, then created a YouTube video that outranked Forbes‘ original in search rankings. Randall Lane, chief confent officer at Forbes, said at the time, “Perplexity had taken our work, without our permission, and republished it across multiple platforms — web, video, mobile — as though it were itself a media outlet.”
About the same time, a reporting team at Wired published a piece titled, “Perplexity is a Bullshit Machine.” Their technical forensics were able to determine that Perplexity was, among other things, circumventing the long-standing Robots Exclusion Protocol and using secret IP addresses to access Wired‘s content. The IP addresses of Perplexity’s web crawlers had been publicly advertised on their site, yet the Wired team uncovered secret un-published IP addresses to get to Wired content. And in terms of representation (or misrepresentation) of their content, they concluded, “Perplexity is summarizing not actual news articles, but reconstructions of what they say based on URLs and traces of them left in search engines like extracts and metadata.”
The third and most recent precursor event was when The New York Times, on October 15th, issued a cease-and-desist letter, demanding Perplexity cease utilizing its content without permission. A week later came NewsCorp’s suit.
Perhaps a final motivation for NewsCorp to sue was the fact that in May of this year, they received a much warmer reception from OpenAI, inking a $250 million/5 year deal for use of NewsCorp content. And NewsCorp is not the only media organization that made deals with OpenAI. Others include the Associated Press, Le Monde, Politico, Axel Springer, the Financial Times, and Dotdash Meredith, which owns magazines such as People and Better Homes & Gardens.
What’s really at play here is not just the money. NewsCorp certainly would like to extract a pound of flash from Perplexity for access to its content trove, the same way it did with OpenAI. What’s at stake is a new legal standard (and precedents) related to fair use in the age of generative AI, and indeed, the future of incumbent media itself. Consider the current GenAI trends lines: in a few years’ time, Multimodal LLMs (highly capable at ingesting audio, video, and text, and outputting transformed versions in as many formats) will be further blurring the lines of authorship not just in text media, but audio and video as well.
…Governments themselves will be too slow to react to this fast-moving tech wave.
Indeed, Google’s current experimental prototype NotebookLM has been a public sensation for one killer feature: Audio Overviews, which are engaging, dare I say addictive, synthetic conversations between two convincingly human-sounding AI personas. Anyone in the legacy media generating revenue from content, already barely managing the disintermediation of social media and the web, must want to get ahead of what could be an even more disruptive GenAI threat to their legacy business models. A suit against Perplexity is perhaps the perfect place to “stand their ground” right now, as governments themselves will be too slow to react to this fast-moving tech wave.
What Happens Next
If history is any guide, the two sides will settle, and NewsCorp might extract the same kind of cash deal with Perplexity as it got from OpenAI. However, the NVIDIA/Bezos-backed Perplexity is in the midst of raising additional funding, and has limited revenue, so they may not be able to pony up what NewsCorp is looking for — not to mention other publishers following in the same path. According to The Information, Perplexity has annual revenue of about $50 million, and its valuation has increased from roughly $500 million one year ago to $3 billion in April of this year.
Perplexity CEO Aravind Srinivas has already said the startup will be looking to increase its revenue through an ad model, and they will share that ad revenue through their aforementioned Publishers’ Program. According to Srinivas, Time, Der Spiegel, Fortune, Entrepreneur, and others have already committed to being part of the revenue-sharing deal.
If Perplexity’s revenue-sharing program succeeds, the company might well establish a new business model and standard for aligning the interests of media and tech. In their own words, the company’s leaders acknowledge that Perplexity’s value is predicated on “trusted, accurate sources covering the topics people care about most.” Their promise to be a credible “answer engine” wouldn’t be possible in a world without credible trusted media outlets. So what is the upside if they “bite the hand,” as the saying goes.
…If legacy media is further disrupted and goes the way of most local newspapers, all that is left are independent journalists and bloggers.
Meanwhile, the authors themselves — the reporters, writers, journalists who work for media companies and the associations and unions representing those authors — are not sitting idly either. If the Statement on AI Training is any indication, the war between media and tech will be fought by those players as well. And if legacy media is further disrupted and goes the way of most local newspapers, all that is left are independent journalists and bloggers. They may be the last stand against the complete synthetic recreation of media.
We are witnessing yet another massive quake in the media landscape. Two great industries — tech and publishing — are wrestling with the implications of technology that not only further distances the creator from the consumer, but also redefines the act of creation itself. All we can say for certain: court rulings on cases like this one will shape the future of these industries, as well as the future of human creation.