OpenAI and Meta came out swinging in defense of several suits claiming their AI products infringe copyright and other rights. While these defendants are leaving to another day the legality of using copyright books for training their AI systems, they have clearly telegraphed the thrust of their defenses.
In OpenAI’s motion to dismiss the Tremblay class action, usurpingly, it highlights the fair use defense.
At the heart of Plaintiffs’ Complaints are copyright claims. Those claims, however, misconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence. The constitutional purpose of copyright is “[t]o promote the Progress of Science and useful Arts.” U.S. CONST. Art. 1, § 8, cl. 8. As the Supreme Court has recognized, [t]he more artistic protection is favored, the more technological innovation may be discouraged; the administration of copyright law is an exercise in managing the tradeoff.” Metro-Goldwin-Mayer Studios Inc. v. Grokster, Ltd., 545 U.S. 913, 928 (2005). Numerous courts have applied the fair use doctrine to strike that balance, recognizing that the use of copyrighted materials by innovators in transformative ways does not violate copyright. See Sega Enterprises. Ltd. v. Accolade, Inc., 977 F.2d 1510 (9th Cir. 1992) (videogame development); Sony Computer Ent., Inc. v. Connectix Corp., 203 F.3d 596 (9th Cir. 2000) (videogame emulators); Kelly v. Arriba Soft Corp., 336 F.3d 811 (9th Cir. 2003) (image search engines), Field v. Google Inc., 412 F. Supp. 2d 1106 (D. Nev. 2006) (web search engines); A.V. ex rel. Vanderhye v. iParadigms, LLC, 562 F.3d 630 (4th Cir. 2009) (plagiarism detection tool); Authors Guild v. Google, Inc. (Google Books), 804 F.3d 202 (2d Cir. 2015) (Google Books Project); Google LLC v. Oracle Am., Inc., 141 S. Ct. 1183 (2021) (interfaces for Android operating system); see generally Mark A. Lemley & Bryan Casey, Fair Learning, 99 TEX. L. REV. 743 (2021). These are the key legal principles upon which countless artificial intelligence products have been developed by a wide array of technology companies.
OpenAI also seeks to contest that AI output can be an infringing derivative work.
With respect to the sole copyright theory challenged here, Plaintiffs’ claims for vicarious infringement are based on the erroneous legal conclusion that every single ChatGPT output is necessarily an infringing “derivative work”—which is a very specific term in copyright law because those outputs are, in only a remote and colloquial sense, “based on” an enormous training dataset that allegedly included Plaintiffs’ books. The Ninth Circuit has rejected such an expansive conception of the “derivative work” right as “frivolous,” holding that a derivative work claim requires a showing that the accused work shares copyright-protected, expressive elements with the original. Plaintiffs’ contrary theory is simply incorrect, and would be unworkable were it not. According to the Complaints, every single ChatGPT output—from a simple response to a question (e.g., “Yes”), to the name of the President of the United States, to a paragraph describing the plot, themes, and significance of Homer’s The Iliad—is necessarily an infringing “derivative work” of Plaintiffs’ books. Worse still, each of those outputs would simultaneously be an infringing derivative of each of the millions of other individual works contained in the training corpus— regardless of whether there are any similarities between the output and the training works. That is not how copyright law works.
In defending the Kadrey class action, Meta also focuses on why OpenAI’s output is not an infringing derivative work.
Plaintiffs’ claim for direct copyright infringement is based on two theories: (1) Meta created unauthorized copies of Plaintiffs’ books in the process of training LLaMA (¶ 40); and (2) “[b]ecause the LLaMA language models cannot function without the expressive information extracted from Plaintiffs’ Infringed Works and retained inside [LLaMA],” the models “are themselves infringing derivative works” (¶ 41). Both theories are without merit, but this Motion addresses only the latter theory, which rests on a fundamental misunderstanding of copyright law.
The fact/expression dichotomy was further elucidated in Authors Guild, in which the Second Circuit rejected an argument that the Google Books project—for which Google made digital copies of millions of books without permission to create a tool allowing Internet users to search for certain words or terms within them—constituted an infringing derivative work. 804 F.3d at 227. The court reasoned that plaintiffs had no “supposed derivative right to supply information about their books,” such as “word frequencies, syntactic patterns, and thematic markers.” Id. At 209, 227. This “statistical information,” the court found, does not constitute “copyrighted expression,” and its use by Google did “not support Plaintiffs’ derivative works argument.” Id.