In an important copyright decision released earlier this week, Circuit Judge Bibas (sitting by designation) in a United States District Court granted Thomson Reuters summary judgement victories by finding that its Westlaw headnotes and key number system were protected by copyright and that the uses of the headnotes to train a competitive legal research tool was not a fair use. The decision in Thomson Reuters Enterprise Centre GmbH v Ross Intelligence Inc, No. 1:20-cv-613-SB (D. Del. Feb. 11, 2025) examined three essential questions, 1) were the Westlaw headnotes and key number system taxonomy protected by copyright, 2) did Ross’s copying of the headnotes to create “legal memos” used to train Ross’s AI system infringe copyright, subject to the defense of fair use, and 3) was Ross’ copying fair use. The court’s findings found in favour of Thomson Reuters on all questions.
The case is a very important one for companies seeking to use copyright content for training AI models. However, while important, the decision will not be directly applicable to the numerous generative AI cases currently before the courts in the United States.
Background to the litigation
Ross, a competitor to Westlaw, made a legal-research search engine that uses artificial intelligence. To train its AI search tool, Ross wanted to use a database of legal questions and answers. Ross asked to license Westlaw’s content. But because Ross was its competitor, Thomson Reuters refused.
To train its AI search engine Ross made a deal with LegalEase to get training data in the form of “Bulk Memos.” Bulk Memos are lawyers’ compilations of legal questions with good and bad answers. LegalEase gave those lawyers a guide explaining how to create those questions using Westlaw headnotes, while clarifying that the lawyers should not just copy and paste headnotes directly into the questions. LegalEase sold Ross roughly 25,000 Bulk Memos, which Ross used to train its AI search tool. In essence, Ross built its competing product using Bulk Memos, which in turn were built from Westlaw headnotes. When Thomson Reuters found out, it sued Ross for copyright infringement.
The court’s decision
The court had little difficulty finding that thousands of Westlaw headnotes were protected by copyright, having met the relatively low level of creativity for a work to be “original. It found that thousands of bulk memos were infringing. The judge made this finding after doing a deep dive comparing the bulk memos to the Westlaw headnotes. He found thousands of them to be substantially similar as a result of copying and rejected the contention that the similarities were the result of copying from the underlying legal decisions. He also rejected that the copying was saved by merger or the scences a fair doctrines. The judge also found that Westlaw’s key number system used to organize its materials were original and protected by copyright.
The court also rejected Ross’s fair use defense.
Under the U.S. copyright law, the courts must consider four non-exclusive fair use factors, namely, (1) the use’s purpose and character, including whether it is commercial or nonprofit; (2) the copyrighted work’s nature; (3) how much of the work was used and how substantial a part it was relative to the copyrighted work’s whole; and (4) how Ross’s use affected the copyrighted work’s value or potential market. The first and fourth factors weigh most heavily in the analysis. Authors Guild v. Google, Inc., 804 F.3d 202, 220 (2d Cir. 2015)
Factor 1 went to Thomson Reuters. Under this factor, the court examined the purpose and character of Ross’s use. “I look mainly at whether it was commercial and whether it was transformative. Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508, 529–31 (2023). If Ross and Thomson Reuters use copyrighted material like the headnotes for very similar purposes and Ross’s use is commercial, this factor likely disfavors fair use.”
The court found Ross’ use to be commercial and not transformative because Westlaw’s headnotes and the Ross copies shared the same or highly similar purposes. “Ross’s use is not transformative because it does not have a ‘further purpose or different character’ from Thomson Reuters’s.”
Ross was using Thomson Reuters’s headnotes as AI data to create a legal research tool to compete with Westlaw. It is undisputed that Ross’s AI is not generative AI (AI that writes new content itself). Rather, when a user enters a legal question, Ross spits back relevant judicial opinions that have already been written. That process resembles how Westlaw uses headnotes and key numbers to return a list of cases with fitting headnotes. Thomson Reuters uses its headnotes and Key Number System primarily to help legal researchers navigate Westlaw and (possibly, as the parties dispute this) to improve Westlaw’s internal search tool.
Ross argued that the headnotes did not appear as part of the final product that Ross put forward to consumers and that consequently any intermediate copying could not be infringing. It claimed that it merely “turned the headnotes into numerical data about the relationships among legal words to feed into its AI.” The court stated “That makes this factor much trickier”, but rejected the argument that the intermediate copying was not infringing, distinguishing cases relied on by Ross that the court characterized as “all about copying computer code” where the intermediate copying depended “in part on the need to copy to reach the underlying ideas.”
On the issue of transformative use, the court concluded:
Neither is true here. Because of that, this case fits more neatly into the newer framework advanced by Warhol. I thus look to the broad purpose and character of Ross’s use. Ross took the headnotes to make it easier to develop a competing legal research tool. So Ross’s use is not transformative. Because the AI landscape is changing rapidly, I note for readers that only non-generative AI is before me today.
Factors 2 and 3 went to Ross, but the 4th (and most important factor), the effect on the market was in Thomson Reuters’ favour.
For this factor, I consider the “likely effect [of Ross’s copying] on the market for the original.” Campbell, 510 U.S. at 590. I must consider not only current markets but also potential derivative ones “that creators of original works would in general develop or license others to develop.” Id. at 592. I also consider any “public benefits the copying will likely produce.” Google, 593 U.S. at 35. The original market is obvious: legal-research platforms. And at least one potential derivative market is also obvious: data to train legal AIs…
My prior opinion left this factor for the jury. I thought that “Ross’s use might be transformative, creating a brand-new research platform that serves a different purpose than Westlaw.” 694 F. Supp. 3d at 486. If that were true, then Ross would not be a market substitute for Westlaw. Plus, I worried whether there was a relevant, genuine issue of material fact about whether Thomson Reuters would use its data to train AI tools or sell its headnotes as training data. And I thought a jury ought to sort out “whether the public’s interest is better served by protecting a creator or a copier.”
In hindsight, those concerns are unpersuasive. Even taking all facts in favor of Ross, it meant to compete with Westlaw by developing a market substitute. And it does not matter whether Thomson Reuters has used the data to train its own legal search tools; the effect on a potential market for AI training data is enough. Ross bears the burden of proof. It has not put forward enough facts to show that these markets do not exist and would not be affected.
Nor does a possible benefit to the public save Ross. Yes, there is a public interest in accessing the law. But legal opinions are freely available, and “the public’s interest in the subject matter” alone is not enough. Harper & Row, 471 U.S. at 569. The public has no right to Thomson Reuters’s parsing of the law. Copyrights encourage people to develop things that help society, like good legal-research tools. Their builders earn the right to be paid accordingly. This case is distinguishable from Google, where the API was valuable “because users, including programmers, [were] just used to it.” 593 U.S. at 38. There is nothing that Thomson Reuters created that Ross could not have created for itself or hired LegalEase to create for it without infringing Thomson Reuters’s copyrights.
Balancing the factors, the court rejected Ross’s fair-use defense
Comments
The Thomson Reuters case is the first decision of a U.S. court to consider whether intermediate copying of works to train an AI system is covered by fair use. As such it will be closely reviewed by the many persons following the numerous U.S. cases alleging copyright infringement in relation to generative AI systems. However, the facts and the issues considered in the Thomson Reuters case are much more straight forward than those in the generative AI cases. For example:
- The analysis of the first factor, which includes whether the use is “transformative”, will be based on significantly different facts, especially given the significant difference between LLMs and diffusion models and the small models in issue in the Ross case and the purpose and character of generative AI systems.
- The decision did not consider whether the outputs of Ross’ trained AI system were infringing or substantially similar to any inputs, something that will be important in the generative AI cases.
- The training materials used were the narrow set of bulk memos copied from Westlaw and potentially some other materials, not the billions of pieces of content used to train generative AI systems.
- The decision did not analyze at length how Ross’ AI system was actually trained or discuss whether the trained models contained copies of Westlaw content. The focus of the analysis was only whether the intermediate copying was a fair use. The generative AI cases will have a wider focus.
- Ross’ AI model was not designed to provide generative AI responses. Rather, it was designed to provide a list of judicial cases in response to user questions. In fact, the judge was at pains to point out that the case involved “non-generative” AI, not a generative AI tool like a large language model (AI that writes new content itself).
- Ross’ AI search engine was found to be a competing market substitute to Westlaw. The generative AI cases are much more complicated.
- The court’s findings on public benefit will need to weigh considerably different facts in the generative AI cases.