Training Data Free-For-All Japan's AI data laws, explained

Published

Jun 14, 2023

Reading time

2 min read

Amid rising questions about the fairness and legality of using publicly available information to train AI models, Japan affirmed that machine learning engineers can use any data they find.

What’s new: A Japanese official clarified that the country’s law lets AI developers train models on works that are protected by copyright.

How it works: In testimony before Japan’s House of Representatives, cabinet minister Keiko Nagaoka explained that the law allows machine learning developers to use copyrighted works whether or not the trained model would be used commercially and regardless of its intended purpose.

Nagaoka said the law technically prohibits developers from using copyrighted works that they had obtained illegally, but conceded that the difficulty of discerning the provenance of large quantities of data makes this limitation difficult to enforce.
Copyright holders have no legal avenue to block use of their works for “data analysis” including AI training. However, such use is prohibited if it would cause them unreasonable harm.
In 2018, Japan modified its Copyright Act to allow free of copyrighted works for training machine learning models as long as the purpose “is not to enjoy the thoughts or feelings expressed in the work.”

Yes, but: Politicians in minority parties have pressed the ruling party to tighten the law. Visual artists and musicians have also pushed for a revision, saying that allowing AI to train on their works without permission threatens their creative livelihoods.

Behind the news: Japan is unusual insofar as it explicitly permits AI developers to use copyrighted materials for commercial purposes.

In the European Union, developers can use copyrighted works freely for research. The EU’s upcoming AI Act, which is expected to become law later this year, requires generative AI developers to disclose their use of copyrighted works in training.
The United Kingdom allows developers to train machine learning models on copyrighted works for research purposes only.
In the United States, copyright law includes a “fair use” principle that generally permits use of copyrighted works without permission as long as the use constitutes a significant change in the work and does not threaten the copyright holder’s interests. Whether or not fair use includes training machine learning models has yet to be determined and may be settled by cases currently in progress.

Why it matters: Last month, member states of the Group of Seven (G7), an informal bloc of industrialized democratic governments that includes Japan, announced a plan to craft mutually compatible regulations and standards for generative AI. Japan’s stance is at odds with that of its fellows, but that could change as the members develop a shared vision.

We’re thinking: In the era of generative AI, the question of what’s fair, and thus what makes a sensible legal standard, is tricky, leading different regions in divergent directions. We applaud the G7 for moving toward globally compatible laws, which will make it easier for developers worldwide to do work that benefits people everywhere.

Subscribe to The Batch