Training Generative AI What’s Legal Versus What’s Fair

Published

Feb 08, 2023

Reading time

3 min read

Dear friends,

As you can read in this issue of The Batch, generative AI companies are being sued over their use of data (specifically images and code) scraped from the web to train their models. Once trained, such models can generate, on demand, images in a given artist’s style or code that executes particular tasks.

The lawsuits will answer the question of whether using publicly available data to train generative models is legal, but I see an even more important question: Is it fair? If society has a point of view on what is fair, we can work to make laws that reflect this.

To be clear, this issue is much bigger than generative AI. The fundamental question is whether AI systems should be allowed to learn from data that’s freely available to anyone with an internet connection. But the focus right now is on models that generate images and code.

Today, we routinely advise students of computer programming to read — and perhaps contribute to — open source code. Reading open source no doubt inspires individuals to write better code. No one questions whether this is fair. After all, it’s how people learn. Is it fair for a computer to do the same?

The last time I visited the Getty Museum in Los Angeles, California, I saw aspiring artists sitting on the floor and copying masterpieces on their own canvases. Copying the masters is an accepted part of learning to be an artist. By copying many paintings, students develop their own style. Artists also routinely look at other works for inspiration. Even the masters whose works are studied today learned from their predecessors. Is it fair for an AI system, similarly, to learn from paintings created by humans?

Of course, there are important differences between human learning and machine learning that bear on fairness. A machine learning model can read far more code and study far more images than a human can. It can also generate far more code or images, far more quickly and cheaply, than even the most skilled human.

These differences raise serious issues for artists, coders, and society at large:

Production of creative works by a machine may devalue the work of human creators.
Generative models can reproduce the personal style of artists whose work they were trained on without compensating those artists.
Such models may have been trained on proprietary data that was not intended to be available on the internet (such as private images that were stolen or leaked).

On the other hand, generative models have tremendous potential value. They’re helping people who are not skilled artists to create beautiful works, spurring artists to collaborate with computers in new ways, and automating workaday tasks so humans can focus on higher-level creativity. Furthermore, advances in AI build upon one another, and progress in generative AI brings progress in other areas as well.

The upshot is that we need to make difficult tradeoffs between enabling technological progress and respecting the desire to protect creators’ livelihoods. Thoughtful regulation can play an important role. One can imagine potential regulatory frameworks such as:

Establishing a consistent way for creators to opt out
Mandating compensation for artists when AI systems use their data
Allocating public funding to artists (like using tax dollars to fund public media such as the BBC)
Setting a time limit, like copyright, after which creative works are available for AI training

What a society views as fair can change. In the United States, once it was considered fair that only certain men could vote. When society’s view on this changed, we changed the rules.

Society currently has divergent views on what is fair for AI to do. Given the bounty offered by generative AI (and other AI systems), and acknowledging the need to make sure that creators are treated fairly, I hope we find a path forward that allows AI to continue to develop quickly for the benefit of all.

Keep learning!

Andrew

Subscribe to The Batch