Copyright Infringement and AI: Insights from Getty v Stability AI

In a landmark decision released yesterday in Getty Images (US) Inc v Stability AI Limited [2025] EWHC 2863 (Ch), a United Kingdom court ruled that Stability AI was not liable for secondary copyright infringement by importing or distributing models that were partly trained using images allegedly owned or exclusively licensed by Getty. Central to the decision was the finding of Justice Joanna Smith that Stability AI’s trained models did not store copies of Getty images as required by the U.K. Copyright, Designs, Copyright, Designs and Patents Act 1988 (the “CDPA”), and hence were not infringing copies, even though they may have been trained without copyright holder authorization.

The Getty decision addressed many issues including whether the Stability AI was liable for trade-mark infringement or passing off and whether Getty held exclusive licenses to the images it purported to control. Getty had also made claims that the training of its models infringed the CDPA. However, early in the trial it dropped that claim as it lacked evidence that Stability AI had infringed any copyrights within the territorial scope of the CDPA in training its models. It also dropped output claims based on actions taken by Stability AI to block prompts that allegedly resulted in primary infringement.

This post examines the UK High Court’s decision as it relates to claims of copyright infringement arising from the training and output of generative AI systems. It analyzes the court’s approach to secondary infringement and the implications of the decision for AI developers and rights-holders whose works are used in large-scale datasets.

Summary of findings on secondary infringement

As in Canada, the CDPA differentiates between primary and secondary acts of infringement. Secondary acts of infringement are broadly addressed to downstream dealings or involvement, as opposed to acts which originate reproductions of copyright works.

Under the CDPA, persons can be liable for secondary infringement if they import an infringing copy of a work or they commit other acts such as distributing an infringing copy of a work, and satisfy other statutory conditions including that they knew or had reason to believe that their relevant dealings are with an infringing copy of a work.[i]

Getty argued that the model(s) in issue, known as Stable Diffusion, was an “infringing copy” because it has been imported into the UK and its making in the U.K. would have constituted an infringement of the copyright under CDPA. It also contended that because the model was trained using infringing copies of Getty images that the model itself had to be regarded as an infringing copy. The court rejected these contentions finding that Stable Diffusion did not itself store the data on which it was trained and could not therefore be considered an infringing copy.

The court summarized evidence adduced at the trial to support the findings. For example, Justice Smith quoted from unchallenged evidence in a report from Professor Brox which described why the training process did not result in the training data being stored in the model:

“8.36…in order for a diffusion model to successfully generate new images, that model must learn patterns in the existing training data so that it can generate entirely new content without reference to that training data.

8.37 Rather than storing their training data, diffusion models learn the statistics of patterns which are associated with certain concepts found in the text labels applied to their training data, i.e. they learn a probability distribution associated with certain concepts. This process of learning the statistics of the data is a desired characteristic of the model and allows the model to generate new images by sampling from the distribution…

8.40 …For models such as Stable Diffusion, trained on very large datasets, it is simply not possible for the models to encode and store their training data as a formula…. It is impossible to store all training images in the weights. This can be seen by way of a simple (example) calculation. As I explained in paragraph 6.28 above, the LAION-5B dataset is around 220TB when downloaded. In contrast, the model weights for Stable Diffusion 1.1-1.4 can be downloaded as a 3.44GB binary file. The model weights are therefore around five orders of magnitude smaller than a dataset which was used in training those weights”.

Justice Smith also found this evidence to be consistent with the evidence in an Expert Joint Statement that “the model weights do not directly store the pixel values associated with billions of training images” – i.e. digital images each consisting of what Professor Farid describes in his report as “an array of pixels”. “During training, images are converted from pixel space into “latent space” using an autoencoder. Latent space is a compressed, representative form of the pixel space image that is more memory and computationally efficient.” [ii]

Getty did not assert that the various versions of Stable Diffusion (or more accurately, the relevant model weights) included or comprised a reproduction of any Copyright Work. Getty’s case was that they did not need to show any of this; they argued it was enough that the making of the model weights (had it been carried out within the UK) would have constituted an infringement of copyright. The court rejected this, finding that the relevant provisions of the CDPA did not address “the act of reproduction of a copy, but to the downstream dealings in an article which is ‘an infringing copy’”.

Thus, it seems to me to be clear that an infringing copy must be a copy, as Stability submits; the essence of the infringement is that there has been an infringement of copyright by the reproduction of the work (including by its storage in any medium by electronic means) in any material form. Consistent with the decision in Sony v Ball, I cannot see how an article can be an infringing copy if it has never consisted of/stored/contained a copy…

As Sony v Ball also makes plain, an article becomes an infringing copy when the act of reproduction occurs. From that moment the article is an infringing copy – but it ceases to be an infringing copy once it no longer contains the copy…

In reality therefore, the dispute between the parties as it finally emerged in closing, really turns on whether an article whose making involves the use of infringing copies, but which never contains or stores those copies, is itself an infringing copy such that its making in the UK would have constituted an infringement. Taking the specific facts with which I am concerned, is an AI model which derives or results from a training process involving the exposure of model weights to infringing copies itself an infringing copy?

In my judgment, it is not. It is not enough, as it seems to me, that (in Getty Images’ words) “the time of making of the copies of the Copyright Works coincides with the making of the Model” (emphasis added). While it is true that the model weights are altered during training by exposure to Copyright Works, by the end of that process the Model itself does not store any of those Copyright Works; the model weights are not themselves an infringing copy and they do not store an infringing copy. They are purely the product of the patterns and features which they have learnt over time during the training process. Getty Images’ central submission that “as soon as it is made, the AI model is an infringing copy” is, accordingly, in my judgment, entirely misconceived… The fact that its development involved the reproduction of Copyright Works (through storing the images locally and in cloud computing resources and then exposing the model weights to those images) is of no relevance The model weights for each version of Stable Diffusion in their final iteration have never contained or stored an infringing copy.

I agree with Stability that the concept of an infringing copy cannot be interpreted in the abstract without reference to the fundamental nature of a copy.. Such an interpretation precludes the potential for an article which has never contained a copy to be capable of being “an infringing copy”. But, sections 27(2) and (3) require the article that is made to be a “copy”. They are not concerned with a process which (while it may involve acts of infringement) ultimately produces an article which is not itself an infringing copy.

Comment on Getty v Stability AI decision

The Getty v Stability AI case is the first decision in the Commonwealth to examine whether the secondary infringement provisions under copyright laws can apply to generative AI models that may have been trained using unauthorized copies of works. While the secondary infringement provisions in Commonwealth countries may differ in some respects, they generally align with the principle that secondary infringement addresses certain dealings in infringing copies of works and not with the process leading to the creation of the infringing copies. As such, while the decision is not binding on other Commonwealth courts, it is likely to be considered as a persuasive authority on the issue.

[i] The UK Statutory Framework is set out below.

Section 22 CDPA provides as follows:

“22. Secondary infringement: importing infringing copy.

The copyright in a work is infringed by a person who, without the licence of the copyright owner, imports into the United Kingdom, otherwise than for his private and domestic use, an article which is, and which he knows or has reason to believe is, an infringing copy of the work” (emphasis added).

Section 23 CDPA is in similar terms:

“23. Secondary infringement: possessing or dealing with infringing copy.

The copyright in a work is infringed by a person who, without the licence of the copyright owner— (a) possesses in the course of a business,

sells or lets for hire, or offers or exposes for sale or hire,
in the course of a business exhibits in public or distributes, or
distributes otherwise than in the course of a business to such an extent as to affect prejudicially the owner of the copyright,

an article which is, and which he knows or has reason to believe is, an infringing copy of the work” (emphasis added).

“Infringing copy” is defined in section 27 CDPA as follows:

“27. Meaning of “infringing copy”.

In this Part “infringing copy” in relation to a copyright work, shall be construed in accordance with this section.
An article is an infringing copy if its making constituted an infringement of the copyright in the work in question.
An article is also an infringing copy if—
it has been or is proposed to be imported into the United Kingdom, and
its making in the United Kingdom would have constituted an infringement of the copyright in the work in question, or a breach of an exclusive licence agreement relating to that work.”

[ii] The judge also quoted as follows from the Agreed Technical Primer:

“During training, each image is encoded into a latent representation to which a random amount of noise is added. The network weights are optimized to predict this noise [following a standard approach known as ‘stochastic gradient descent’] so that the network can recreate the content that is destroyed by the additive noise. Besides the noisy images, this optimization also takes into account the paired text description associated with each image.

The optimization of the weights follows a standard approach known as ‘stochastic gradient descent’. Given the status of the weights, the locally best change of these weights (the gradient) is computed to optimize the noise prediction for the training samples at hand. The gradients of training samples are accumulated over a small subset of the training dataset: the batch. Its size is referred to as the batch size. After taking a step in the direction of this accumulated gradient, the training procedure selects a new batch for the next step, and so on until all the images in the training data are processed, defined to be a single ‘epoch’. This entire process may be repeated for more than one epoch

A single training image only has a limited impact on how the parameters of the network will be changed. It is competing with the gradients of the other samples in the batch, which may not necessarily agree on the direction of the gradient. Because each batch is a different subset of the full training dataset, the gradients in each step will not include a training image seen in a previous batch (unless that image is itself replicated in the training images). Rather, it will only be seen again after all training images have been seen once (an epoch), and going over the training images is repeated

The network training makes the most efficient progress when many training samples agree on a local or global pattern. In this case, some of the weights are changed consistently by all agreeing samples”.

Copyright Infringement and AI: Insights from Getty v Stability AI

Summary of findings on secondary infringement

Comment on Getty v Stability AI decision

Related Topics

Leave a ReplyCancel reply

Subscribe

Summary of findings on secondary infringement

Comment on Getty v Stability AI decision

Related

Related Topics

Subscribe

You May Also Like

Quebec IT Framework Act Decision Impact on Search Engines

Copyright in AI Prompts: Chinese Court Ruling on Generative AI and Originality

OSFI E-23 Guideline and Its Impact on Financial Institutions

2025 Year in Review: What You (and the Algorithms) Loved Most

Leave a ReplyCancel reply

Discover more from Barry Sookman