It's Time to Update Copyright for Generative AI We need new copyright laws that enable generative AI developers and users to move forward without risking lawsuits.

Published

Jul 19, 2023

Reading time

3 min read

Dear friends,

Many laws will need to be updated to encourage beneficial AI innovations while mitigating potential harms. One example: Copyright law as it relates to generative AI is a mess! That many businesses are operating without a clear understanding of what is and isn’t legal slows down innovation. The world needs updated laws that enable AI users and developers to move forward without risking lawsuits.

Legal challenges to generative AI are on the rise, as you can read here, and the outcomes are by no means clear. I’m seeing this uncertainty slow down the adoption of generative AI in big companies, which are more sensitive to the risk of lawsuits (as opposed to startups, whose survival is often uncertain enough that they may have much higher tolerance for the risk of a lawsuit a few years hence).

Meanwhile, regulators worldwide are focusing on how to mitigate AI harm. This is an important topic, but I hope they will put equal effort into crafting copyright rules that would enable AI to benefit more people more quickly.

Here are some questions that remain unresolved in most countries:

Is it okay for a generative AI company to train its models on data scraped from the open internet? Access to most proprietary data online is governed by terms of service, but what rules should apply when a developer accesses data from the open internet and has not entered into an explicit agreement with the website operator?
Having trained on freely available data, is it okay for a generative AI company to stop others from training on its system’s output?
If a generative AI company’s system generates material that is similar to existing material, is it liable for copyright infringement? How can we evaluate the allowable degree of similarity?
Research has shown that image generators sometimes copy their training data. While the vast majority of generated content appears to be novel, if a customer (say, a media company) uses a third-party generative AI service (such as a cloud provider’s API) to create content, reproduces it, and the content subsequently turns out to infringe a copyright, who is responsible: the customer or the cloud provider?
Is automatically generated material protected by copyright, and if so, who owns it? What if two users use the same generative AI model and end up creating similar content — will the one who went first own the copyright?

Here’s my view:

I believe humanity is better off with permissive sharing of information. If a person can freely access and learn from information on the internet, I’d like to see AI systems allowed to do the same, and I believe this will benefit society. (Japan permits this explicitly. Interestingly, it even permits use of information that is not available on the open internet.)
Many generative AI companies have terms of service that prevent users from using output from their models to train other models. It seems unfair and anti-competitive to train your system on others’ data and then stop others from training their models on your system’s output.
In the U.S., “fair use” is poorly defined. As a teacher who has had to figure out what I am and am not allowed to use in a class, I’ve long disliked the ambiguity of fair use, but generative AI makes this problem even more acute. Until now, our primary source of content has been humans, who generate content slowly, so we’ve tolerated laws that are so ambiguous that they often require a case-by-case analysis to determine if a use is fair. Now that we can automatically generate huge amounts of content, it’s time to come up with clearer criteria for what is fair. For example, if we can algorithmically determine whether generated content overlaps by a certain threshold with content in the training data, and if this is the standard for fair use, then it would unleash companies to innovate while still meeting a societally accepted standard of fairness.

If it proves too difficult to come up with an unambiguous definition of fair use, it would be useful to have “safe harbor” laws: As long as you followed certain practices in generating media, what you did would be considered non-infringing. This would be another way to clarify things for users and generative AI companies.

The tone among regulators in many countries is to seek to slow down AI’s harms. While that is important, I hope we see an equal amount of effort put into accelerating AI’s benefits. Sorting out how we should change copyright law would be a good step. Beyond that, we need to craft regulations that clarify not just what’s not okay to do — but also what is explicitly okay to do.

Keep learning!

Andrew

Subscribe to The Batch